
NUCLEIC ACID MOLECULES AND OTHER MOLECULES 
ASSOCIATED WITH STEROL SYNTHESIS AND METABOLISM 

FIELD OF THE INVENTION 

This invention relates to the field of biotechnology, particularly as it pertains to the 
5 production of sterols in a variety of host systems particularly plants. More specifically, the 
invention relates to nucleic acid molecules encoding proteins and fragments of proteins 
associated with sterol and phytosterol metabolism as well as the encoded proteins and fragments 
of proteins and antibodies capable of binding to them. The invention also relates to methods of 
using the nucleic acid molecules, fragments of the nucleic acid molecules, proteins, and 
10 fragments of proteins. The invention also relates to cells, organisms, particularly plants, or seeds, 
or progeny of plants, that have been manipulated to contain increased levels or overexpress at 
least one sterol or phytosterol compound. 

BACKGROUND OF THE INVENTION 
Sterols are a class of essential, natural compounds required by all eukaryotes to complete 
15 their life cycle. The types of sterols produced and predominantly present within each of the 
phylogenetic kingdoms varies. Plants produce a class of sterols called phytosterols. A 
phytosterol called sitosterol predominates. In animals, cholesterol is typically the major sterol 
while in fungi it is ergosterol. 

Phytosterols from plants possess a wide spectrum of biological activities in animals and 
20 humans. Phytosterols are considered efficacious cholesterol-lowering agents (Pelletier et a/.. 
Annals Nutrit. Metab. 39:291-295 (1995), the entirety of which is herein incorporated by 
reference). Lower cholesterol levels are linked to a reduction in the risk to cardiovascular 
disease. Phytosterols can also block cholesterol absorption in the intestine, which would also 
lead to lower cholesterol levels. Thus, enhancing the levels of phytosterols in edible plants and 
25 seeds, or products derived from these plants and seeds, may lead to food products w ith increased 
nutritive or therapeutic value. 



In one aspect, this invention provides these desirable plants and seeds as well as methods 
to produce them. Since, as will be discussed below, the genetic manipulation made possible by 
this invention involves families of related genes that cross phylogenetic boundaries, the effects 
are not limited to plants alone. 
5 Biochemistry of Sterol Synthesis 

A number of the important sterol biosynthetic enzymes, reactions, and intermediates have 
been described. Sterol synthesis uses acetyl CoA as the basic carbon building block. Multiple 
acetyl CoA molecules form the five-carbon isoprene units, hence the name isoprenoid pathway. 
Enzymatic combination of isoprene units leads to the thirty-carbon squalene molecule, which is 
10 the penultimate precursor to sterols. 

Throughout plants, animals, and fungus, the reactions proceed as: acetyl CoA _ 
HMGCoA, mevalonate, mevalonate 5 phosphate, mevalonate 5-pyrophosphate, isopentyl 
diphosphate, 5-pyrophosphatemevalonate, isopentyl pyrophosphate (PIP), dimethylallyl 
pyrophosphate (DMAPP), PIP + DMAPP, geranyl pyrophosphate + IPP, farnesyl pyrophosphate, 
,15 2 farnesyl pyrophosphate, squalene and squalene epoxide 

From squalene epoxide, the sterol biosynthesis pathway of plants diverges from that of 
animals and fungi. In plants, cycloartenol is produced next by cyclization of squalene epoxide. 
The plant pathway eventually leads to the synthesis of the predominant phytosterol, sitosterol. 

Animals go on to produce lanosterol from squalene epoxide, eventually leading to 
20 cholesterol, which is the precursor to steroid hormones and bile acids, among other compounds. 
In fungi, lanosterol leads to the production of the predominant sterol, ergosterol. 

An important regulatory control step within the pathway consists of the HMGCoA _ 
Mevalonate step, catalyzed by HMGCoA reductase, and the condensation of 2 farnesyl 
pyrophosphates _ squalene, catalyzed by squalene synthase. An early, reported rate-limiting 
25 step, in the pathway is the HMGCoA reductasc-catalyzed reaction. 

A number of studies have focused on the regulation of HMGCoA reductase. HMGCoA 
reductase (EC 1.1.1.34) catalyzes the reductive conversion of HMGCoA to mevalonic acid 




(MVA). This reaction is a reported controlling step in isoprenoid biosynthesis. The enzyme is 
regulated by feedback mechanisms and by a system of activation kinases and phosphatases 
(Gray, Adv. Bot. Res., 14: 25 (1987); Bach et aL Lipids, 26: 637 (1991); Stermer et al, J. Lipid 
Res., 35: 1 133 (1994), all of which are herein incorporated by reference in their entirety). 
5 Another important regulation occurs at the squalene synthase step. Squalene synthase 

(EC 2.5.1.21) reductively condenses two molecules of FPP in the presence of Mg 2+ and NADPH 
to form squalene. The reaction involves a head-to-head condensation and forms a stable 
intermediate, presqualene diphosphate. The enzyme is subject to regulation similar to that of 
HMGCoA reductase and acts by balancing the incorporation of FPP into sterols and other 
10 compounds. 

The sterol pathway of plants diverges from that in animals and fungi after squalene 
epoxide. In plants, the cyclization of squalene epoxide occurs next, under the regulated control 
of cycloartenol synthase (EC 5.4.99.8). The cyclization mechanism proceeds from the epoxy end 
into a chair-boat-chair-boat sequence that is mediated by a transient C-20 carbocationic 

15 intermediate. The reported rate-limiting step in plant sterol synthesis occurs in the next step, S- 
adenosyl-L-methionine:sterol C-24 methyl transferase (EC 2.1.1.41) (SMTO catalyzing the 
transfer of a methyl group from a cofactor, S-adenosyl-L-methionine, to the C-24 center of the 
sterol side chain. This is the first of two methyl transfer reactions. The second methyl transfer 
reaction occurs further down in the pathway and has been reported to be catalyzed by SMTn. An 

20 isoform enzyme, SMTn, catalyzes the conversion of 24-methylene lophenol to 24-ethylidene 
lophenol (Fonteneau et al, Plant Set Lett 10:147-155(1977), the entirety of which is herein 
incorporated by reference). The presence of two distinct SMTs in plants were further confirmed 
by cloning cDNAs code the enzymes from Arahidopsis (Husselstein et al, FEBS Lett 381:87- 
92( 1996), the entirety of which is herein incorporated by reference), soybean (Shi et al, J Biol 

25 Chem 271: 9384-9389(1996), the entirety of which is herein incorporated by reference), maize 
(Grebenok et al, Plant Mol Biol 34: 891-S96( 1997), the entirety of which is herein incorporated 
by reference) and tobacco (Bouvier-Nave et al, Eur J Biochem 246: 518-529 ( 1997); Bouvier- 



Nave et ai, Eur J Biochem 256: 88-96(1998), both of which are herein incorporated by reference 
in their entirety). 

Later in the pathway, a sterol C-14 demethylase catalyzes the demethylation at C-14, 
removing the methyl group and creating a double bond. Interestingly, this enzyme also occurs in 
5 plants and fungi, but at a different point in the pathway. Sterol C14-demethylation is mediated 
by a cytochrome P-450 complex. A large family of enzymes utilize the cytochrome P-450 
complex. There is, in addition, a family of cytochrome P450 complexes. For example, sterol C- 
22 desaturase (EC 2.7.3.9) catalyzes the formation of the double bond at C-22 on the side chain. 
The C-22 desaturase in yeast, which is the final step in the biosynthesis of ergosterol, contains a 

10 cytochrome P450 that is distinct from the cytochrome P450 participating in the demethylation 
reaction. Additional cytochrome P450 enzymes participate in brassinosteroid synthesis (Bishop, 
Plant Cell 8:959-969 (1996), the entirety of which is herein incorporated by reference). 
Brassinosteroids are steroidal compounds with plant growth regulatory properties, including 
modulation of cell expansion and photomorphogenesis (Artecal, Plant Hormones, Physiology, 

15 Biochemistry and Molecular Biology ed. Davies, Kluwer Academic Publishers, Dordrecht, 66 

(1995), Yakota, Trends in Plant Science 2:137-143 (1997), both of which are herein incorporated 
by reference in their entirety). 

One class of proteins, oxystcrol-binding proteins, have been reported in humans and yeast 
(Jiang et al., Yeast 10: 341-353 (1994), the entirety of which is herein incorporated by reference). 

20 These proteins have been reported to modulate ergosterol levels in yeast (Jiang et a/., Yeast 10: 
341-353 (1994)). In particular, Jiang et al, reported three genes KES1, HES1 and OSH1, which 
encode proteins containing an oxysterol-binding region. 
Enzyme Inhibitors and Modulators 

Self-regulatory and feedback regulatory mechanisms of some of the sterol synthesis 
25 enzymes provide opportunities to effect sterol metabolism. For example, the introduction of the 
feedback inhibitor molecule inhibits enzyme action while the removal of that molecule up- 
regulates the enzyme. In certain circumstances, non-wild type enzymes can effect normal 




regulation. These organisms can be generated, for example, by traditional genetic crosses, 
mutation treatments and through molecular genetics. One example is the overexpression of plant 
HMGCoA reductase in transgenic plants resulting in a 6-10 fold increase in the total sterol levels 
(for example, transgenic tobacco plants overproducing phytosterols in Schaller et aL, Plant 
5 Physiol. 109: 761 (1995), the entirety of which is herein incorporated by reference). 

A number of compounds have been identified that, at least partially, exert their effects on 
sterol synthesis. For example, mevinolinic acid and lovastatin are competitive inhibitors of 
HMGCoA reductase and zaragonic acid is a competitive inhibitor of squalene synthase (Alberts 
et aL, Proc. Natl. Acad. Sci. (U.S.A. ) 77:3957-61 (1993); Bergstrom et aL, Proc. Natl Acad. Sci. 

10 (U.S.A.) 90:80-84 (1980), both of which are herein incorporated by reference). Many fungicides 
and insecticides act by inhibiting enzymes, such as those noted above or the C-14 demethylase 
enzyme (Sterol Biosynthesis Inhibitors and Anti-feeding Compounds, Kato et aL, Springer- 
Verlag, New York (1986); Sterol biosynthesis inhibitors: pharmaceutical and agrochemical 
aspects, eds. Berg and Plempel, Ellis Horvvood, Chichester, England (1988), both of which are 

15 herein incorporated by reference in this entirety). 

However, the use of these compounds can have toxic effects that preclude their use in 
products destined for animal or human consumption. Furthermore, the increase or decrease in 
sterol levels possible using these compounds is limited. Typically, the changes in levels occur 
over a wide spectrum of the pathway. New and more effective methods for manipulating sterol 

20 synthesis are desired. 

The present invention provides a gene, Hesl, involved in plant phytosterol production. 
Expression of HES1 (protein) in organisms such as plants can increase phytosterol biosynthesis. 
The present invention also provides transgenic organisms expressing a HES1 protein, which can 
enhance food and feed sources. 

25 SUMMARY OF THE INVENTION 

The present invention includes a substantially purified nucleic acid molecule that encodes 
a protein comprising the amino acid sequence of SEQ ID NO: 622. 



The present invention includes a substantially purified nucleic acid molecule that 
specifically hybridizes to a nucleic acid sequence of SEQ ID NO: 1 or its complement, wherein 
the nucleic acid molecule encodes a protein comprising the amino acid sequence of SEQ ID NO: 
622. 

5 The present invention includes a substantially purified nucleic acid molecule that encodes 

a protein comprising the amino acid sequence of SEQ ID NO: 623. 

The present invention includes a substantially purified nucleic acid molecule that 
specifically hybridizes to a nucleic acid sequence of SEQ ID NO: 2 or its complement, wherein 
the nucleic acid molecule encodes a protein comprising the amino acid sequence of SEQ ID NO: 
10 623. 

The present invention includes a substantially purified nucleic acid molecule that encodes 
a protein comprising the amino acid sequence of SEQ ID NO: 624. 

The present invention includes a substantially purified nucleic acid molecule that 
specifically hybridizes to a nucleic acid sequence of SEQ ID NO: 3 or its complement, wherein 
15 the nucleic acid molecule encodes a protein comprising the amino acid sequence of SEQ ID NO: 
624. 

The present invention includes a substantially purified nucleic acid molecule that encodes 
a protein comprising the amino acid sequence of SEQ ID NO: 625. 

The present invention includes a substantially purified nucleic acid molecule that 
20 specifically hybridizes to a nucleic acid sequence of SEQ ID NO: 4 or its complement, wherein 
the nucleic acid molecule encodes a protein comprising the amino acid sequence of SEQ ID NO: 
625. 

The present invention includes a substantially purified nucleic acid molecule comprising 
a nucleic acid sequence which encodes a plant HES1 protein. 
25 The present invention includes an antibody capable of specifically binding a protein 

comprising the amino acid sequence of SEQ ID NO: 622. 
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The present invention includes an antibody capable of specifically binding a protein 
comprising the amino acid sequence of SEQ ID NO: 623. 

The present invention includes an antibody capable of specifically binding a protein 
comprising the amino acid sequence of SEQ ID NO: 624. 
5 The present invention includes an antibody capable of specifically binding a protein 

comprising the amino acid sequence of SEQ ID NO: 625. 

The present invention also provides a transformed plant having a nucleic acid molecule 
which comprises: (A) an exogenous promoter region which functions in a plant cell to cause the 
production of a mRNA molecule; which is linked to (B) a structural nucleic acid molecule, 
10 wherein the structural nucleic acid molecule comprises a nucleic acid sequence encoding a 

protein having an amino acid sequence selected from the group consisting of SEQ ID NO: 622 
through SEQ ID NO: 626 or fragment thereof; which is linked to (C ) a 3' non-translated sequence 
that functions in the plant cell to cause termination of transcription and addition of 
polyadenylated ribonucleotides to a 3' end of the mRNA molecule. 
15 The present invention also provides a transformed plant having a nucleic acid molecule 

which comprises: (A) an exogenous promoter region which functions in a plant cell to cause the 
production of a mRNA molecule; which is linked to (B) a transcribed nucleic acid molecule with 
a transcribed strand and a non-transcribed strand, wherein the transcribed strand is 
complementary to a nucleic acid molecule comprising a nucleic acid sequence selected from the 
20 group consisting of SEQ ID NO: 1 through SEQ ID NO: 621 or fragment thereof; which is linked 
to (C) a 3' non-translated sequence that functions in plant cells to cause termination of 
transcription and addition of polyadenylated ribonucleotides to a 3' end of the mRNA molecule. 

The present invention also provides a method for determining a level or pattern in a plant 
of a protein in a plant comprising: (A) incubating, under conditions permitting nucleic acid 
25 hybridization, a marker nucleic acid molecule, the marker nucleic acid molecule selected from 
the group of marker nucleic acid molecules which specifically hybridize to a nucleic acid 
molecule having the nucleic acid sequence of SEQ ID NO: 1 through SEQ ID NO: 621 or 
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complements thereof, with a complementary nucleic acid molecule obtained from the plant cell 
or plant tissue, wherein nucleic acid hybridization between the marker nucleic acid molecule and 
the complementary nucleic acid molecule obtained from the plant permits the detection of an 
mRNA for the enzyme; (B) permitting hybridization between the marker nucleic acid molecule 
5 and the complementary nucleic acid molecule obtained from the plant cell or plant tissue; and (C) 
detecting the level or pattern of the complementary nucleic acid, wherein the detection of the 
complementary nucleic acid is predictive of the level or pattern of the protein in the plant. 

The present invention also provides a method for determining a level or pattern of a 
protein in a plant under evaluation which comprises assaying the concentration of a molecule, 

10 whose concentration is dependent upon the expression of a gene, the gene specifically hybridizes 
to a nucleic acid molecule having a nucleic acid sequence selected from the group consisting of 
SEQ ID NO: 1 through SEQ ED NO: 621 or complements thereof, in comparison to the 
concentration of that molecule present in a reference plant with a known level or pattern of the 
protein, wherein the assayed concentration of the molecule is compared to the assayed 

15 concentration of the molecule in the reference plant with the known level or pattern of the 
protein. 

The present invention also provides a method for determining a mutation in a plant whose 
presence is predictive of a mutation affecting a level or pattern of a protein comprising the steps: 
(A) incubating, under conditions permitting nucleic acid hybridization, a marker nucleic acid, 

20 the marker nucleic acid selected from the group of marker nucleic acid molecules which 

specifically hybridize to a nucleic acid molecule having a nucleic acid sequence selected from the 
group of SEQ ID NO: 1 through SEQ ID NO: 621 or complements thereof and a complementary 
nucleic acid molecule obtained from the plant, wherein nucleic acid hybridization between the 
marker nucleic acid molecule and the complementary nucleic acid molecule obtained from the 

25 plant permits the detection of a polymorphism whose presence is predictive of a mutation 

affecting the level or pattern of the protein in the plant; (B) permitting hybridization between the 
marker nucleic acid molecule and the complementary nucleic acid molecule obtained from the 
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plant; and (C) detecting the presence of the polymorphism, wherein the detection of the 
polymorphism is predictive of the mutation. 

The present invention also provides a method of producing a plant containing an 
overexpressed protein comprising: (A) transforming the plant with a functional nucleic acid 
5 molecule, wherein the functional nucleic acid molecule comprises a promoter region, wherein the 
promoter region is linked to a structural region, wherein the structural region has a nucleic acid 
sequence selected from group consisting of SEQ ID NO: 1 through SEQ ID NO: 621, wherein 
the structural region is linked to a 3' non-translated sequence that functions in the plant to cause 
termination of transcription and addition of polyadenylated ribonucleotides to a 3' end of a 

10 mRNA molecule; and wherein the functional nucleic acid molecule results in overexpression of 
the protein; and (B) growing the transformed plant. 

The present invention also provides a method of producing a plant containing an 
overexpressed protein comprising: (A) transforming the plant with a functional nucleic acid 
molecule, wherein the functional nucleic acid molecule comprises a promoter region, wherein the 

15 promoter region is linked to a structural region, wherein the structural region encodes a protein 
comprising an amino acid sequence selected from group consisting of SEQ ID NO: 622 through 
SEQ ID NO: 626, wherein the structural region is linked to a 3' non-translated sequence that 
functions in the plant to cause termination of transcription and addition of polyadenylated 
ribonucleotides to a 3' end of a mRNA molecule; and wherein the functional nucleic acid 

20 molecule results in overexpression of the protein; and ( B) growing the transformed plant. 

The present invention also provides a method of producing a plant containing reduced 
levels of a protein comprising: (A) transforming the plant with a functional nucleic acid 
molecule, wherein the functional nucleic acid molecule comprises a promoter region, wherein the 
promoter region is linked to a structural region, wherein the structural region comprises a nucleic 

25 acid molecule having a nucleic acid sequence selected from the group consisting of SEQ ID NO: 
1 through SEQ ID NO: 621; wherein the structural region is linked to a 3' non-translated 
sequence that functions in the plant to cause termination of transcription and addition of 



polyadenylated ribonucleotides to a 3' end of a mRNA molecule; and wherein the functional 
nucleic acid molecule results in co-suppression of the protein; and (B) growing the transformed 
plant. 

The present invention also provides a method of producing a plant containing reduced 
5 levels of a protein comprising: (A) transforming the plant with a functional nucleic acid 

molecule, wherein the functional nucleic acid molecule comprises a promoter region, wherein the 
promoter region is linked to a structural region, wherein the structural region encodes a protein 
comprising an amino acid sequence selected from group consisting of SEQ ID NO: 622 through 
SEQ ID NO: 626; wherein the structural region is linked to a 3' non-translated sequence that 

10 functions in the plant to cause termination of transcription and addition of polyadenylated 
ribonucleotides to a 3' end of a mRNA molecule; and wherein the functional nucleic acid 
molecule results in co-suppression of the protein; and (B) growing the transformed plant. 

The present invention also provides a method for reducing expression of a protein in a 
plant comprising: (A) transforming the plant with a nucleic acid molecule, the nucleic acid 

15 molecule having an exogenous promoter region which functions in a plant cell to cause the 
production of a mRNA molecule, wherein the exogenous promoter region is linked to a 
transcribed nucleic acid molecule having a transcribed strand and a non-transcribed strand, 
wherein the transcribed strand is complementary to a nucleic acid molecule having a nucleic acid 
sequence selected from the group consisting of SEQ ID NO: 1 through SEQ ID NO: 621 or 

20 fragments thereof and the transcribed strand is complementary to an endogenous mRNA 

molecule; and wherein the transcribed nucleic acid molecule is linked to a 3' non-translated 
sequence that functions in the plant cell to cause termination of transcription and addition of 
polyadenylated ribonucleotides to a 3' end of a mRNA molecule; and (B) growing the 
transformed plant. 

25 The present invention also provides a method for reducing expression of a protein in a 

plant comprising: (A) transforming the plant with a nucleic acid molecule, the nucleic acid 
molecule having an exogenous promoter region which functions in a plant cell to cause the 
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production of a mRNA molecule, wherein the exogenous promoter region is linked to a 
transcribed nucleic acid molecule having a transcribed strand and a non-transcribed strand, 
wherein the transcribed strand is complementary to a nucleic acid molecule having a nucleic acid 
encodes a protein comprising an amino acid sequence selected from group consisting of SEQ ID 
NO: 622 through SEQ ID NO: 626 or frgaments thereof and the transcribed strand is 
complementary to an endogenous mRNA molecule; and wherein the transcribed nucleic acid 
molecule is linked to a 3' non-translated sequence that functions in the plant cell to cause 
termination of transcription and addition of polyadenylated ribonucleotides to a 3' end of a 
mRNA molecule; and (B) growing the transformed plant. 

The present invention also provides a method of determining an association between a 
polymorphism and a plant trait comprising: (A) hybridizing a nucleic acid molecule specific for 
the polymorphism to genetic material of a plant, wherein the nucleic acid molecule has a nucleic 
acid sequence selected from the group consisting of SEQ ID NO: 1 through SEQ ID NO: 621 or 
complements thereof or fragment of either; and (B) calculating the degree of association between 
the polymorphism and the plant trait. 

The present invention also provides a method of isolating a nucleic acid that encodes a 
protein or fragment thereof comprising: (A) incubating under conditions permitting nucleic acid 
hybridization, a first nucleic acid molecule comprising a nucleic acid sequence selected from the 
group consisting of SEQ ID NO: 1 through SEQ ID NO: 621 or complements thereof or fragment 
of either with a complementary second nucleic acid molecule obtained from a plant; 
(B) permitting hybridization between the first nucleic acid molecule and the second nucleic acid 
molecule obtained from the plant; and (C) isolating the second nucleic acid molecule. 

The present invention also provides a method for producing a protein or fragment thereof 
in an organism comprising introducing a vector comprising a nucleic acid of the present 
invention and expressing the protein or fragment. 



DETAILED DESCRIPTION OF THE INVENTION 

One skilled in the art can refer to general reference texts for detailed descriptions of 
known techniques discussed herein or equivalent techniques. These texts include Current 
Protocols in Molecular Biology Ausubel, et al y eds., John Wiley & Sons, N.Y. (1989), and 
5 supplements through September (1998), Molecular Cloning, A Laboratory Manual (Sambrook et 
a/., 2nd Ed., Cold Spring Harbor Press, Cold Spring Harbor, New York (1989), for example, 
each of which are specifically incorporated by reference in their entirety). These texts can also be 
referred to in making or using an aspect of the invention. 

The agents of the invention will preferably be "biologically active" with respect to either 
10 a structural attribute, such as the capacity of a nucleic acid to hybridize to another nucleic acid 
molecule, or the ability of a protein to be bound by an antibody (or to compete with another 
molecule for such binding). Alternatively, such an attribute may be catalytic and thus involve the 
capacity of the agent to mediate a chemical reaction or response. 

The term "substantially purified", as used herein, refers to a molecule separated from 
15 substantially all other molecules normally associated with it in its native state. More preferably a 
substantially purified molecule is the predominant species present in a preparation. A 
substantially purified molecule may be greater than 60% free, preferably 75% free, more 
preferably 90% free, and most preferably 95% free from the other molecules (exclusive of 
solvent) present in the natural mixture. The term "substantially purified" is not intended to 
20 encompass molecules present in their native state. 

The agents of the invention may also be recombinant. As used herein, the term 
recombinant means any agent {e.g., DNA, peptide etc.), that is, or results, however indirect, from 
human manipulation of a nucleic acid molecule. 

It is understood that the agents of the invention may be labeled with reagents that 
25 facilitate detection of the agent (e.g., fluorescent labels, Prober et ai, Science 238:336-340 
(1987); Albarcllat'/ ai., EP 144914; chemical labels, Sheldon et ai, U.S. Patent 4,582,789; 
Albarella et ai, U.S. Patent 4,563,417; modified bases, Miyoshi et <//., EP 1 19448, all of which 




are hereby incorporated by reference in their entirety). It is further understood that the invention 
provides recombinant bacterial, mammalian, microbial, archaebacterial, insect, fungal, and plant 
cells as well as viral constructs comprising the agents of the invention, 
(a) Nucleic Acid Molecules 
5 Agents of the invention include nucleic acid molecules and, more preferably, nucleic acid 

molecules of maize, soybean, canola, yeast, or Arabidopsis. In addition, a number of different 
plants can be the ultimate source of the nucleic acid molecules of the invention. An exemplary 
group of genotypes includes: B73 (Illinois Foundation Seeds, Champaign, Illinois U.S.A.); B73 
x Mol7 (Illinois Foundation Seeds, Champaign, Illinois U.S.A.); DK604 (Dekalb Genetics, 

10 Dekalb, Illinois U.S.A.); H99 (Illinois Foundation Seeds, Champaign, Illinois U.S.A.); RX601 
(Asgrow Seed Company, Des Moines, Iowa); and Mol7 (Illinois Foundation Seeds, Champaign, 
Illinois U.S.A.). And an exemplary group of soybean types includes: Asgrow 3244 (Asgrow 
Seed Company, Des Moines, Iowa); CI 944 (United States Department of Agriculture (USD A) 
Soybean Germplasm Collection, Urbana, Illinois U.S.A. ); Cristalina (USDA Soybean 

15 Germplasm Collection, Urbana, Illinois U.S.A.); FT108 (Monsoy, Brazil); Hartwig (USDA 
Soybean Germplasm Collection, Urbana, Illinois U.S.A.); BW21 IS Null (Tohoku University, 
Morioka, Japan), PI507354 (USDA Soybean Germplasm Collection, Urbana, Illinois U.S.A.); 
Asgrow A4922 (Asgrow Seed Company, Des Moines, Iowa U.S.A.); PI227687 (USDA Soybean 
Germplasm Collection, Urbana, Illinois U.S.A.); PI229358 (USDA Soybean Germplasm 

20 Collection, Urbana, Illinois U.S.A.); and Asgrow A3237 (Asgrow Seed Company, Des Moines, 
Iowa U.S.A.). 

A particularly preferred embodiment of the nucleic acid molecules of the present 
invention are plant nucleic molecules that comprise a nucleic acid sequence which encodes an 
oxysterol-binding protein consensus sequence, for example, soybean HES1 (SEQ ID NOS: 622, 
25 623 and 624), and maize HES 1 (SEQ ID NO: 625). 



- 13 - 




Another particularly preferred embodiment of the nucleic acid molecules of the present 
invention are yeast nucleic acid molecules that comprise a nucleic acid sequence which encodes 
an oxysterol-binding protein consensus sequence, for example yeast HES1 (SEQ ID NO: 626). 
A particularly preferred embodiment of the nucleic acid molecules of the invention are 
5 nucleic acid molecules that encode a protein or fragment thereof where the protein or fragment 
thereof is selected from the group consisting of a HES1, HMGCoA reductase, squalene synthase, 
cycloartenol synthase, SMTI, SMTII and UPC2. In a more particularly preferred embodiment of 
the nucleic acid molecules of the present invention are nucleic acid molecules that encode a 
protein or fragment thereof where the protein or fragment thereof is selected from the group 

10 consisting of a fungal, more preferably a yeast HES1, a plant, more preferably a maize, soybean 
or Arabidopsis HES 1 , a plant, more preferably a rubber or an Arabidopsis HMGCoA reductase, a 
plant, more preferably an Arabidopsis squalene synthase, a plant, more preferably an Arabidopsis 
cycloartenol synthase, a plant, more preferably an Arabidopsis SMTI or SMTII and a fungus, 
more preferably a yeast UPC2. 

15 In a preferred embodiment, the nucleic molecule encodes a HES1 protein, preferably a 

plant HES1 protein comprising an oxysterol-binding protein consensus sequence — E(K, Q) xSH 
(H, R) PPx (S, T, A, C, F)A. In a further preferred embodiment, the nucleic acid molecule 
encodes a HES1 protein comprising an amino acid sequence selected from the group consisting 
of SEQ ID NO: 622, SEQ ID NO: 623, SEQ ID NO: 624 and SEQ ID NO: 625. In a further 

20 preferred embodiment, the nucleic acid molecule molecules encodes a HES1 protein with a 
conservative amino acid substitution in an amino acid sequence selected from the group 
consisting of SEQ ID NO: 622, SEQ ID NO: 623, SEQ ID NO: 624 and SEQ ID NO: 625. In a 
further preferred embodiment, the nucleic acid molecule molecules encodes a HES1 protein with 
between 2 and 5 conservative amino acid substitutions in an amino acid sequence selected from 

25 the group consisting of SEQ ID NO: 622, SEQ ID NO: 623, SEQ ID NO: 624 and SEQ ID NO: 
625. In a further preferred embodiment, the nucleic acid molecule molecules encodes a HES1 
protein with between 5 and 10 conservative amino acid substitutions in an amino acid sequence 
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selected from the group consisting of SEQ ID NO: 622, SEQ ID NO: 623, SEQ ID NO: 624 and 
SEQ ID NO: 625. In a further preferred embodiment, the nucleic acid molecule encodes a HES 1 
protein with more than 10 conservative amino acid substitutions in an amino acid sequence 
selected from the group consisting of SEQ ID NO: 622, SEQ ID NO: 623, SEQ ID NO: 624 and 
5 SEQ ID NO: 625. 

In another preferred embodiment, the nucleic molecule encodes a HES1 protein, 
preferably a yeast HES1 protein comprising an oxysterol-binding protein consensus sequence - 
E(K, Q) xSH (H, R) PPx (S, T, A, C, F)A. In a further preferred embodiment, the nucleic acid 
molecule encodes a HES1 protein comprising an amino acid sequence SEQ ID NO: 626. In a 

10 further preferred embodiment, the nucleic acid molecule molecules encodes a HES1 protein with 
a conservative amino acid substitution in amino acid sequence SEQ ID NO: 626. In a further 
preferred embodiment, the nucleic acid molecule molecules encodes a HES1 protein with 
between 2 and 5 conservative amino acid substitutions in an amino acid sequence SEQ ID NO: 
626. In a further preferred embodiment, the nucleic acid molecule molecules encodes a HES1 

15 protein with between 5 and 10 conservative amino acid substitutions in an amino acid sequence 
SEQ ID NO: 626. In a further preferred embodiment, the nucleic acid molecule encodes a HES 1 
protein with more than 10 conservative amino acid substitutions in an amino acid sequence SEQ 
ID NO: 626. 

In an aspect of the present invention, one or more of the nucleic acid molecules of the 
20 present invention differ in nucleic acid sequence from those encoding a protein or fragment 

thereof in SEQ ID NO: 1 through SEQ ID NO: 621 due to the degeneracy in the genetic code in 
that they encode the same protein but differ in nucleic acid sequence. In another further aspect of 
the present invention, one or more of the nucleic acid molecules of the present invention differ in 
nucleic acid sequence from those encoding a protein or fragment thereof in SEQ ID NO: 1 
25 through SEQ ID NO: 621 due to fact that the different nucleic acid sequence encodes a protein 
having one or more conservative amino acid residue. Examples of conservative substitutions are 
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set forth in Table 1. It is understood that codons capable of coding for such conservative 
substitutions are known in the art. 

Table 1 



5 


Original Residue 


Conservative Substituti 




Ala 


Ser 




Are 


Lys 




Asn 


Gin; His 




Asp 


Glu 


10 


Cys 


Ser; Ala 




Gin 


Asn 




Glu 


Asp 




Gly 


Pro 




His 


Asn; Gin 


15 


He 


Leu; Val 




Leu 


He; Val 




Lys 


Arg; Gin; Glu 




Met 


Leu; He 




Phe 


Met; Leu; Tyr 


20 


Ser 


Thr 




Thr 


Ser 




Trp 


Tyr 




Tyr 


Trp; Phe 




Val 


He; Leu 



25 In a further aspect of the present invention, one or more of the nucleic acid molecules of 

the present invention differ in nucleic acid sequence from those encoding a protein or fragment 
thereof set forth in SEQ ID NO: 1 through SEQ ID NO: 621 or fragment thereof due to the fact 




that one or more codons encoding an amino acid has been substituted for a codon that encodes a 
nonessential substitution of the amino acid originally encoded. 

One subset of the nucleic acid molecules of the invention is fragment nucleic acids 
molecules. Fragment nucleic acid molecules may consist of significant portion(s) of, or indeed 
5 most of, the nucleic acid molecules of the invention, such as those specifically disclosed. 

Alternatively, the fragments may comprise smaller oligonucleotides (having from about 15 to 
about 400 nucleotide residues and more preferably, about 15 to about 30 nucleotide residues, or 
about 50 to about 100 nucleotide residues, or about 100 to about 200 nucleotide residues, or 
about 200 to about 400 nucleotide residues, or about 275 to about 350 nucleotide residues). 

10 A fragment of one or more of the nucleic acid molecules of the invention may be a probe 

and specifically a PCR probe. A PCR probe is a nucleic acid molecule capable of initiating a 
polymerase activity while in a double-stranded structure with another nucleic acid. Various 
methods for determining the structure of PCR probes and PCR techniques exist in the art. 
Computer generated searches using programs such as Primer3 (www-genome, wi. mit.edu/cgi- 

15 bin/primer/primer3.cgi), STSPipeline (www-genome.wi.mit.edu/cgi-bin/www-STS_Pipeline), or 
GeneUp (Pesole et al, BioTechniques 25:1 12-123 (1998)), for example, can be used to identify 
potential PCR primers. 

As used herein, two nucleic acid molecules are said to be capable of specifically 
hybridizing to one another if the two molecules are capable of forming an anti-parallel, double- 

20 stranded nucleic acid structure. 

A nucleic acid molecule is said to be the "complement" of another nucleic acid molecule 
if they exhibit complete complementarity. As used herein, molecules are said to exhibit 
"complete complementarity" when every nucleotide of one of the molecules is complementary to 
a nucleotide of the other. Two molecules are said to be "minimally complementary" if they can 

25 hybridize to one another with sufficient stability to permit them to remain annealed to one 

another under at least conventional "low-stringency" conditions. Similarly, the molecules are 
said to be "complementary" if they can hybridize to one another with sufficient stability to permit 
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them to remain annealed to one another under conventional "high-stringency'* conditions. 
Conventional stringency conditions are described by Sambrook et a/., Molecular Cloning, A 
Laboratory Manual, 2nd Ed., Cold Spring Harbor Press, Cold Spring Harbor, New York (1989) 
and by Haymes et al. t Nucleic Acid Hybridization, A Practical Approach, IRL Press, 
5 Washington, DC (1985). Departures from complete complementarity are therefore permissible, 
as long as such departures do not completely preclude the capacity of the molecules to form a 
double-stranded structure. Thus, in order for a nucleic acid molecule to serve as a primer or 
probe it need only be sufficiently complementary in sequence to be able to form a stable double- 
stranded structure under the particular solvent and salt concentrations employed. 

10 Appropriate stringency conditions which promote DNA hybridization are, for example, 

6.0 X sodium chloride/sodium citrate (SSC) at about 45°C, followed by a wash of 2.0 X SSC at 
20-25°C, are known to those skilled in the art or can be found in Current Protocols in Molecular 
Biology, John Wiley & Sons, N.Y. (1989), 6.3.1-6.3.6. For example, the salt concentration in the 
wash step can be selected from a low stringency of about 2.0 X SSC at 50°C to a high stringency 

15 of about 0.2 X SSC at 65°C. In addition, the temperature in the wash step can be increased from 
low stringency conditions at room temperature, about 22°C, to high stringency conditions at 
about 65°C. Both temperature and salt may be varied, or either the temperature or the salt 
concentration may be held constant while the other variable is changed. 

In a preferred embodiment, a nucleic acid of the invention will specifically hybridize to 

20 one or more of the nucleic acid molecules set forth in SEQ ID NO: 1 through SEQ ID NO: 621 or 
complements thereof or more preferably to a nucleic acid molecule having a nucleic acid 
sequence selected from the group consisting of SEQ ID NO: 1 through SEQ ID NO: 4, SEQ ID 
NO: 6 through SEQ ID NO: 29 or complements thereof under moderately stringent conditions, 
for example at about 2.0 X SSC and about 65°C. 

25 In a particularly preferred embodiment, a nucleic acid of the invention will include those 

nucleic acid molecules that specifically Hybridize to one or more of the nucleic acid molecules 
set forth in SEQ ID NO: 1 through SEQ ID NO: 621 or complements thereof or more preferably 




to a nucleic acid molecule having a nucleic acid sequence selected from the group consisting of 
SEQ ID NO: 1 through SEQ ID NO: 4, SEQ ID NO: 6 through SEQ ID NO: 29 or complements 
thereof under high stringency conditions such as 0.2 X SSC and about 65°C. 

In one aspect of the invention, the nucleic acid molecules of the invention have one or 
5 more of the nucleic acid sequences set forth in SEQ ID NO: 1 through SEQ ID NO: 621 or 
complements thereof or fragment thereof or more preferably to a nucleic acid molecule having 
SEQ ID NO: 1 through SEQ ID NO: 4, SEQ ID NO: 6 through SEQ ID NO: 29 or complements 
thereof. In another aspect of the invention, one or more of the nucleic acid molecules of the 
invention share between about 100% and 70% sequence identity with one or more of the nucleic 

10 acid sequences set forth in SEQ ID NO: 1 through SEQ ID NO: 621 or complements thereof or 
more preferably to a nucleic acid molecule having a nucleic acid sequence selected from the 
group consisting of SEQ ID NO: 1 through SEQ ID NO: 4, SEQ ID NO: 6 through SEQ ID NO: 
29 or complements thereof. In a further aspect of the invention, one or more of the nucleic acid 
molecules of the invention share between about 100% and 90% sequence identity with one or 

15 more of the nucleic acid sequences set forth in SEQ ID NO: 1 through SEQ ID NO: 621 or 
complements thereof or more preferably to a nucleic acid molecule having a nucleic acid 
sequence selected from the group consisting of SEQ ID NO: 1 through SEQ ID NO: 4, SEQ ID 
NO: 6 through SEQ ID NO: 29 or complements thereof. In a more preferred aspect of the 
invention, one or more of the nucleic acid molecules of the invention share between about 100% 

20 and 95% sequence identity with one or more of the nucleic acid sequences set forth in SEQ ID 
NO: 1 through SEQ ID NO: 621 or complements thereof or more preferably to a nucleic acid 
molecule having a nucleic acid sequence selected from the group consisting of SEQ ID NO: 1 
through SEQ ID NO: 4, SEQ ID NO: 6 through SEQ ID NO: 29 or complements thereof. In an 
even more preferred aspect of the invention, one or more of the nucleic acid molecules of the 

25 invention share between about 100% and 99% sequence identity with one or more of the 

sequences set forth in SEQ ID NO: 1 through SEQ ID NO: 621 or complements thereof or more 
preferably to a nucleic acid molecule having a nucleic acid sequence selected from the group 
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consisting of SEQ ID NO: 1 through SEQ ID NO: 4, SEQ ID NO: 6 through SEQ ID NO: 29, or 
complements thereof. 

In a preferred embodiment the percent identity calculations are performed using the 
Megalign program of the LASERGENE bioinformatics computing suite (default parameters, 
5 DNASTAR Inc., Madison, Wisconsin). 

In a preferred embodiment of the present invention, the nucleic acid molecule of the 
present invention encodes a protein or fragment thereof, where a protein exhibits a BLAST 
probability score of greater than IE- 12, preferably a BLAST probability score of between about 
1E-30 and about IE- 12, even more preferably a BLAST probability score of greater than 1E-30 
10 with its homologue. 

In a preferred embodiment of the present invention, the nucleic molecule of the present 
invention encodes a protein or fragment thereof where a protein exhibits a BLAST score of 
greater than 120, preferably a BLAST score of between about 1450 and about 120, even more 
preferably a BLAST score of greater than 1450 with its homologue. 
15 Nucleic acid molecules of the present invention can comprise sequences that encode a 

protein or fragment thereof. Such proteins or fragments thereof include homologues of known 
proteins in other organisms. 

A nucleic acid molecule of the invention can also encode a homolog protein. As used 
herein, a homolog protein molecule or fragment thereof is a counterpart protein molecule or 
20 fragment thereof in a second species (e.g., maize HES1 is a homolog of Arabidopsis HES1). A 
homolog can also be generated by molecular evolution or DNA shuffling techniques, so that the 
molecule retains at least one functional or structure characteristic of the original protein (see, for 
example, U.S. Patent 5,8 1 1,238). 

Particularly preferred homologues are selected from the group consisting of alfalfa, 
25 Arabidopsis, barley, Brassica, broccoli, cabbage, citrus, cotton, garlic, oat, oilseed rape, onion, 
canola, flax, an ornamental plant, maize, peanut, pepper, potato, rice, rye, sorghum, soybean, 
strawberry, sugarcane, sugarbeet, tomato, wheat, poplar, pine, fir, eucalyptus, apple, lettuce, 

-20- 



lentils, grape, banana, tea, turf grasses, sunflower, soybean, and Phaseolus. A particularly 
preferred group of homologues are crops harvested for seed oils, including but not limited to 
rapeseed (high erucic acid rape and canola), maize, soybean, saf flower, sunflower, cotton, 
peanut, flax, oil palm and Cuphea. 
5 In a preferred embodiment, nucleic acid molecules having SEQ ID NO: 1 through SEQ 

ID NO: 621 or complements and fragments of either can be utilized to obtain such homologues. 

The degeneracy of the genetic code, which allows different nucleic acid sequences to 
code for the same protein or peptide, is known in the literature. (U.S. Patent No. 4,757,006, the 
entirety of which is herein incorporated by reference). 

10 Agents of the invention include nucleic acid molecules that encode a substantially 

purified nucleic acid molecules encoding at least about a 10 amino acid region, more preferably a 
20, 30, 40, or 50 amino acid region, of a protein selected from the group consisting of a fungal, 
more preferably a yeast HES1, a plant, more preferably a maize, soybean or Arabidopsis HES1, a 
plant, more preferably a rubber or an Arabidopsis HMGCoA reductase, a plant, more preferably 

15 an Arabidopsis squalene synthase, a plant, more preferably an Arabidopsis cycloartenol synthase, 
a plant, more preferably an Arabidopsis SMTI or SMTII and a fungus, more preferably a yeast 
UPC2. 

(b) Protein and Peptide Molecules 

A class of agents comprises one or more of the protein or fragments thereof or peptide 
20 molecules having a nucleic acid sequence selected from the group consisting of SEQ ID NO: 1 
through SEQ ID NO: 621 or one or more of the protein or fragment thereof and peptide 
molecules encoded by other nucleic acid agents of the invention. A particular preferred class of 
proteins are those having an amino acid sequence selected from the group consisting of SEQ ID 
NO: 622 through SEQ ID NO: 625 or fragments thereof. 
25 As used herein, the term "protein molecule" or "peptide molecule" includes any molecule 

that comprises five or more amino acids. It is well known in the art that proteins may undergo 
modification, including post-translational modifications, such as, but not limited to, disulfide 




bond formation, glycosylation, phosphorylation, or oligomerization. Thus, as used herein, the 
term "protein molecule" or "peptide molecule" includes any protein molecule that is modified by 
any biological or non-biological process. The terms "amino acid" and "amino acids" refer to all 
naturally occurring L-amino acids. This definition is meant to include norleucine, norvaline, 
5 ornithine, homocysteine, and homoserine. 

One or more of the protein or fragment of peptide molecules may be produced via 
chemical synthesis, or more preferably, by expressing in a suitable bacterial or eukaryotic host. 
Suitable methods for expression are described by Sambrook et al, In: Molecular Cloning, A 
Laboratory Manual, 2nd Edition, Cold Spring Harbor Press, Cold Spring Harbor, New York 
10 (1989), or similar texts. 

A "protein fragment" is a peptide or polypeptide molecule whose amino acid sequence 
comprises a subset of the amino acid sequence of that protein. A protein or fragment thereof that 
comprises one or more additional peptide regions not derived from that protein is a "fusion" 
protein. Such molecules may be derivatized to contain carbohydrate or other moieties (such as 
15 keyhole limpet hemocyanin, etc.). Fusion protein or peptide molecules of the invention are 
preferably produced via recombinant means. 

Another class of agents comprise protein or peptide molecules or fragments or fusions 
thereof comprising SEQ ID NO: 622 through SEQ NO: 625 or fragment thereof or encoded by 
SEQ ID NO: 1 through SEQ ID NO: 621 in which conservative, non-essential or non-relevant 
20 amino acid residues have been added, replaced or deleted. Computerized means for designing 
modifications in protein structure are known in the art (Dahiyat and Mayo, Science 278:82-87 
(1997), the entirety of which is herein incorporated by reference). 

A particularly preferred embodiment of the nucleic acid molecules of the present 
invention are proteins comprising an amino acid sequence which corresponds to an oxysterol- 
25 protein binding consensus sequence. 

In a preferred embodiment of the present invention, the nucleic molecule of the present 
invention encodes a protein or fragment thereof, where a protein exhibits a BLAST probability 




score of greater than IE- 12, preferably a BLAST probability score of between about 1E-30 and 
about 1E-12, even more preferably a BLAST probability score of greater than 1E-30 with its 
homologue. 

In a preferred embodiment of the present invention, the nucleic molecule of the present 
5 invention encodes a protein or fragment thereof where a protein exhibits a BLAST score of 
greater than 120, preferably a BLAST score of between about 1450 and about 120, even more 
preferably a BLAST score of greater than 1450 w ith its homologue. 

In another preferred embodiment of the present invention, the nucleic acid molecule 
encoding a protein or fragment thereof exhibits a % identity with its homologue of between about 
10 25% and about 40%, more preferably of between about 40 and about 70%, even more preferably 
of between about 70% and about 90% and even more preferably between about 90% and 99%. 
In another preferred embodiment of the present invention, a protein or fragments thereof exhibits 
a % identity with its homologue of 100%. 

In a preferred embodiment the percent identity calculations are performed using the 
15 Megalign program of the LASERGENE bioinformatics computing suite (default parameters, 
DNASTAR Inc., Madison, Wisconsin). 

A protein of the invention can also be a homologue protein. As used herein, a homologue 
protein molecule or fragment thereof is a counterpart protein molecule or fragment thereof in a 
second species (e.g., maize HMGCoA reductase is a homologue of Arabidopsis HMGCoA 
20 reductase). A homologue can also be generated by molecular evolution or DNA shuffling 

techniques, so that the molecule retains at least one functional or structure characteristic of the 
original (see, for example, U.S. Patent 5,81 1,238, the entirety of which is herein incorporated by 
reference). 

Particularly preferred homologues are selected from the group consisting of alfalfa, 
25 Arabidopsis, barley, Brassica, broccoli, cabbage, citrus, cotton, garlic, oat, oilseed rape, onion, 
canola, flax, an ornamental plant, maize, peanut, pepper, potato, rice, rye, sorghum, soybean, 
strawberry, sugarcane, sugarbeet, tomato, wheat, poplar, pine, fir, eucalyptus, apple, lettuce. 




lentils, grape, banana, tea, turf grasses, sunflower, soybean, and Phaseolus. A particularly 
preferred group of homologues are those from oil plants such as cotton, canola and sunflower. 

In a preferred embodiment, nucleic acid molecules having SEQ ID NO: 1 through SEQ 
ID NO: 621 or complements and fragments of either can be utilized to obtain such homologues. 
5 The degeneracy of the genetic code, which allows different nucleic acid sequences to 

code for the same protein or peptide, is known in the literature. (U.S. Patent No. 4,757,006, the 
entirety of which is herein incorporated by reference). 

Agents of the invention include proteins comprising at least about a 10 amino acid region, 
more preferably a 20, 30, 40, or 50 amino acid region, of a protein selected from the group 

10 consisting of a fungal, more preferably a yeast HES1, a plant, more preferably a maize, soybean 
or Arabidopsis HES1, a plant, more preferably a rubber or an Arabidopsis HMGCoA reductase, a 
plant, more preferably an Arabidopsis squalene synthase, a plant, more preferably an Arabidopsis 
cycloartenol synthase, a plant, more preferably an Arabidopsis SMTI or SMTII and a fungus, 
more preferably a yeast UPC2. 

15 (c) Plant Constructs and Plant Transformants 

One or more of the nucleic acid molecules of the invention may be used in plant 
transformation or transfection. Exogenous genetic material may be transferred into a plant cell 
and the plant cell regenerated into a whole, fertile or sterile plant. Exogenous genetic material is 
any genetic material, whether naturally occurring or otherwise, from any source that is capable of 

20 being inserted into any organism. In a preferred embodiment the exogenous genetic material 
includes a nucleic acid molecule of the present invention, preferably a nucleic acid molecule 
having a sequence selected from the group consisting of SEQ ID NO: 1 through SEQ ID NO: 621 
or complements thereof or fragments of either. Another preferred class of exogenous genetic 
material are nucleic acid molecules that encode a protein having an amino acid selected from the 

25 group consisting of SEQ ID NO: 622 through SEQ ID NO: 626 or fragments thereof. 

Genetic material may be transferred into either monocotyledons and dicotyledons 
including, but not limited to maize, soybean, \rabidopsis, phaseolus, peanut, alfalfa, wheat, rice. 



oat, sorghum, rye, tritordeum, millet, fescue, perennial ryegrass, sugarcane, cranberry, papaya, 
banana, banana, muskmelon, apple, cucumber, dendrobium, gladiolus, chrysanthemum, liliacea, 
cotton, eucalyptus, sunflower, canola, turf grass, sugarbeet, coffee and dioscorea (Christou, In: 
Particle Bombardment for Genetic Engineering of Plants, Biotechnology Intelligence Unit, 
5 Academic Press, San Diego, California (1996), the entirety of which is herein incorporated by 
reference). In a particular preferred embodiment, any seed-bearing plant may be employed as the 
target plant species for modification in accordance with this invention, including angiosperms, 
gymnosperms, monocotyledons, and dicotyledons. Plants of special interest are crops harvested 
for seed oils, including but not limited to rapeseed (high erucic acid rape and canola), maize, 

10 soybean, safflower, sunflower, cotton, peanut, flax, oil palm and Cuphea. 

Transfer of a nucleic acid that encodes for a protein can result in overexpression of that 
protein in a transformed cell or transgenic plant. One or more of the proteins or fragments 
thereof encoded by nucleic acid molecules of the invention may be overexpressed in a 
transformed cell or transformed plant. Such overexpression may be the result of transient or 

15 stable transfer of the exogenous genetic material. 

In another preferred aspect of the present invention, exogenous genetic material is a 
nucleic acid molecule that comprises a nucleic acid sequence which encodes a HES1 protein or 
fragment thereof, more preferably a yeast HES1 protein or fragment thereof, even more 
preferably a plant HES1 protein or fragment thereof. 

20 In a preferred embodiment, expression or overexpression of a HES1 protein in a plant 

provides in that plant, relative to an untransformed plant with a similar genetic background, an 
increased level of phytosterols. 

In a preferred embodiment, expression or overexpression of a HES 1 protein in a plant 
provides in that plant, relative to an untransformed plant with a similar genetic background, an 

25 altered composition of phytosterols. 




In another embodiment, overexpression of a HES1 protein in a plant provides in that 
plant, relative to an untransformed plant with a similar genetic background, an increased level of 
a HES1 protein in a plastid. 

In another preferred embodiment, overexpression of the HES 1 protein in a transformed 
5 plant will result in a plant which as a food or feed constituent exhibits an increased ability to act 
as a cholesterol lowering agent relative to an untransformed plant with a similar genetic 
background. 

In a preferred embodiment of the present invention, the protein or fragment thereof 
overexpressed in the transgenic plant is selected from the group consisting of a HES1, HMGCoA 

10 reductase, squalene synthase, cycloartenol synthase, SMTI, SMTII and UPC2. In a more 
particularly preferred embodiment of the present invention is a protein or fragment thereof, 
where the protein or fragment thereof is selected from the group consisting of a fungal, more 
preferably a yeast HES 1 , a plant, more preferably a maize, soybean or Arabidopsis HES 1 , a 
plant, more preferably a rubber or an Arabidopsis HMGCoA reductase, a plant, more preferably 

15 an Arabidopsis squalene synthase, a plant, more preferably an Arabidopsis cycloartenol synthase, 
a plant, more preferably an Arabidopsis SMTI or SMTII and a plant, more preferably a yeast 
UPC2. 

In another preferred embodiment of the present invention, the protein or fragment thereof 
overexpressed in the transgenic plant is selected from the group consisting a plant HES1, 
20 HMGCoA reductase, squalene synthase, cycloartenol synthase, SMTI, SMTII and yeast UPC2. 
In a further even more particularly preferred embodiment of the present invention the protein or 
fragment thereof is a plant HES1. In an additional even more particularly preferred embodiment 
of the present invention the protein or fragment thereof is a maize, soybean or Arabidopsis 
HES1. 

25 In another preferred embodiment of the present invention, the protein or fragment thereof 

overexpressed in the transgenic plant is a HES1 protein, preferably a plant HES1 protein 
comprising an oxysterol-binding piotein consensus sequence -- E(K. Q) xSH (H, R) PPx (S, T, 
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A, C, F)A. In another preferred embodiment of the present invention, the protein or fragment 
thereof overexpressed in the transgenic plant is a HES1 protein that comprises an amino acid 
sequence selected from the group consisting of SEQ ID NO: 622, SEQ ID NO: 623, SEQ ID NO: 
624 and SEQ ID NO: 625. In another preferred embodiment of the present invention, the protein 
5 or fragment thereof overexpressed in the transgenic plant is a HES1 protein with a conservative 
amino acid substitution in an amino acid sequence selected from the group consisting of SEQ ID 
NO: 622, SEQ ID NO: 623, SEQ ID NO: 624 and SEQ ID NO: 625. In another preferred 
embodiment of the present invention, the protein or fragment thereof overexpressed in the 
transgenic plant is a HES1 protein with between 2 and 5 conservative amino acid substitutions in 

10 an amino acid sequence selected from the group consisting of SEQ ID NO: 622, SEQ ED NO: 
623, SEQ ID NO: 624 and SEQ ID NO: 625. In another preferred embodiment of the present 
invention, the protein or fragment thereof overexpressed in the transgenic plant is a HES1 protein 
with between 5 and 10 conservative amino acid substitutions in an amino acid sequence selected 
from the group consisting of SEQ ID NO: 622, SEQ ID NO: 623, SEQ ID NO: 624 and SEQ ID 

15 NO: 625. In another preferred embodiment of the present invention, the protein or fragment 
thereof overexpressed in the transgenic plant is a HES1 protein with more than 10 conservative 
amino acid substitutions in an amino acid sequence selected from the group consisting of SEQ ID 
NO: 622, SEQ ID NO: 623, SEQ ID NO: 624 and SEQ ID NO: 625. 

In another preferred embodiment of the present invention, the protein or fragment thereof 

20 overexpressed in the transgenic plant is a HES1 protein that comprises an amino acid sequence 
SEQ ID NO: 626. In another preferred embodiment of the present invention, the protein or 
fragment thereof overexpressed in the transgenic plant is a HES1 protein with a conservative 
amino acid substitution in an amino acid sequence SEQ ID NO: 626. In another preferred 
embodiment of the present invention, the protein or fragment thereof overexpressed in the 

25 transgenic plant is a HES1 protein with between 2 and 5 conservative amino acid substitutions in 
an amino acid sequence SEQ ID NO: 626. In another preferred embodiment of the present 
invention, the protein or fragment thereof overexpressed in the transgenic plant is a HES1 protein 




with between 5 and 10 conservative amino acid substitutions in an amino acid sequence SEQ ID 
NO: 625. In another preferred embodiment of the present invention, the protein or fragment 
thereof overexpressed in the transgenic plant is a HES1 protein with more than 10 conservative 
amino acid substitutions in an amino acid sequence SEQ ID NO: 626. 
5 Exogenous genetic material may be transferred into a host cell by the use of a DNA 

vector or construct designed for such a purpose. Design of such a vector is generally within the 
skill of the art (See, Plant Molecular Biology: A Laboratory Manual, Clark (ed.), Springier, New 
York (1997), the entirety of which is herein incorporated by reference). 

A construct or vector may include a plant promoter to express the protein or protein 

10 fragment of choice. A number of promoters, which are active in plant cells, have been described 
in the literature. These include the nopaline synthase (NOS) promoter (Ebert et al, Proc. Natl. 
Acad. Sci. (U.S.A.) 84:5745-5749 (1987), the entirety of which is herein incorporated by 
reference), the octopine synthase (OCS) promoter (which are carried on tumor-inducing plasmids 
of Agrobacterium tumefaciens), the caulimovirus promoters such as the cauliflower mosaic virus 

15 (CaMV) 19S promoter (Lawton et al. y Plant MoL Biol. 9:3 15-324 ( 1987), the entirety of which is 
herein incorporated by reference) and the CaMV 35S promoter (Odell et al., Nature .?/J:810-812 
(1985 ), the entirety of which is herein incorporated by reference), the figwort mosaic virus 35S- 
promoter, the light-inducible promoter from the small subunit of ribulose-l,5-bis-phosphate 
carboxylase (ssRUBISCO), the Adh promoter (Walker et al., Proc. Natl Acad. Sci. (U.S.A.) 

20 #4:6624-6628 (1987), the entirety of which is herein incorporated by reference), the sucrose 

synthase promoter (Yang et al., Proc. Natl. Acad. Sci. (U.S.A.) £7:4144-4148 (1990), the entirety 
of which is herein incorporated by reference), the R gene complex promoter (Chandler et al, The 
Plant Cell 7:1 175-1 183 (1989), the entirety of which is herein incotporated by reference) and the 
chlorophyll a/b binding protein gene promoter, etc. These promoters have been used to create 

25 DNA constructs that have been expressed in plants; see, e.g., PCT publication WO 84/02913, 
herein incorporated by reference in its entirety. The CaMV 35S promoters are preferred for use 



in plants. Promoters known or found to cause transcription of DNA in plant cells can be used in 
the invention. 

For the purpose of expression in source tissues of the plant, such as the leaf, seed, root or 
stem, it is preferred that the promoters utilized have relatively high expression in these specific 
5 tissues. Tissue-specific expression of a protein of the present invention is a particularly preferred 
embodiment. For this purpose, one may choose from a number of promoters for genes with 
tissue- or cell-specific or -enhanced expression. Examples of such promoters reported in the 
literature include the chloroplast glutamine synthetase GS2 promoter from pea (Edwards et aL, 
Proc. Natl. Acad. ScL (U.S.A.) 87:3459-3463 (1990), herein incorporated by reference in its 
10 entirety), the chloroplast fructose- 1,6-biphosphatase (FBPase) promoter from wheat (Lloyd et aL, 
MoL Gen. Genet. 225:209-216 (1991), herein incorporated by reference in its entirety), the 
nuclear photosynthetic ST-LS1 promoter from potato (Stockhaus et aL, EMBO J. 5:2445-2451 

(1989) , herein incorporated by reference in its entirety), the serine/threonine kinase (PAL) 
promoter and the glucoamylase (CHS) promoter from Arabidopsis thaliana. Also reported to be 

15 active in photosynthetically active tissues are the ribulose-l,5-bisphosphate carboxylase (RbcS) 
promoter from eastern larch {Larix laricina), the promoter for the cab gene, cab6, from pine 
(Yamamoto et aL, Plant Cell Physiol. 35:113-118 (1994), herein incorporated by reference in its 
entirety), the promoter for the Cab-1 gene from wheat (Fejes et aL, Plant MoL Biol. 75:921-932 

(1990) , herein incorporated by reference in its entirety), the promoter for the CAB-1 gene from 
20 spinach (Lubberstedt et aL. Plant Physiol. 704:997-1006 (1994), herein incorporated by 

reference in its entirety), the promoter for the cablR gene from rice (Luan et aL, Plant Cell. 
4:971-981 (1992), the entirety of which is herein incorporated by reference), the pyruvate, 
orthophosphate dikinase (PPDK) promoter from maize (Matsuoka et aL, Proc. Natl. Acad. Sci. 
(U.S.A.) 90: 9586-9590 (1993), herein incorporated by reference in its entirety), the promoter for 
25 the tobacco Lhcbl*2 gene (Cerdan et aL, Plant MoL Biol. ^:245-255 (1997), herein 

incorporated by reference in its entirety), the Arabidopsis thaliana SUC2 sucrose-H+ symporter 
promoter (Truemit et aL, Planta. 796:564-570 (1995), herein incorporated by reference in its 



entirety) and the promoter for the thylakoid membrane proteins from spinach (psaD, psaF, psaE, 
PC, FNR, atpC, atpD, cab, rbcS). Other promoters for the chlorophyll a/b-binding proteins may 
also be utilized in the invention, such as the promoters for LhcB gene and PsbP gene from white 
mustard (Sinapis alba; Kretsch et «/., Plant Mol Biol 23:219-229 (1995), the entirety of which 
5 is herein incorporated by reference). 

For the purpose of expression in sink tissues of the plant, such as the tuber of the potato 
plant, the fruit of tomato, or the seed of maize, wheat, rice and barley, it is preferred that the 
promoters utilized in the invention have relatively high expression in these specific tissues. A 
number of promoters for genes with tuber-specific or -enhanced expression are known, including 

10 the class I patatin promoter (Bevan et al, EMBO 7. 8: 1899-1906 (1986); Jefferson et al, Plant 
Mol Biol 74:995-1006 (1990), both of which are herein incorporated by reference in their 
entirety), the promoter for the potato tuber ADPGPP genes, both the large and small subunits, the 
sucrose synthase promoter (Salanoubat and Belliard, Gene 60:47-56 (1987), Salanoubat and 
Belliard, Gene 34:181-185 (1989), both of which are incorporated by reference in their entirety), 

15 the promoter for the major tuber proteins including the 22 kd protein complexes and proteinase 
inhibitors (Hannapel, Plant Physiol 707:703-704 (1993), herein incorporated by reference in its 
entirety), the promoter for the granule bound starch synthase gene (GBSS) (Visser et al, Plant 
Mol Biol 77:691-699 (1991), herein incorporated by reference in its entirety) and other class I 
and II patatins promoters (Koster-Topfer et «/., Mol Gen Genet, 279:390-396 (1989); Mignery et 

20 al, Gene. 62:21 '-44 (1988), both of which are herein incorporated by reference in their entirety). 

Other promoters can also be used to express a protein or fragment thereof in specific 
tissues, such as seeds or fruits. The promoter for (3-conglycinin (Chen et ai, Dev. Genet. 10: 
1 12-122 (1989), herein incorporated by reference in its entirety) or other seed-specific promoters 
such as the napin and phaseolin promoters, can be used. The zeins are a group of storage 

25 proteins found in maize endosperm. Genomic clones for zein genes have been isolated (Pedersen 
et al, Cell 29: 1015-1026 (1982), herein incorporated by reference in its entirety) and the 
promoters from these clones, including the 15 kD, 16 kD, 19 kD, 22 kD, 27 kD and y genes, 
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could also be used. Other promoters known to function, for example, in maize include the 
promoters for the following genes: waxy, Brittle, Shrunken 2, Branching enzymes I and II, starch 
synthases, debranching enzymes, oleosins, glutelins and sucrose synthases. A particularly 
preferred promoter for maize endosperm expression is the promoter for the glutelin gene from 
5 rice, more particularly the Osgt-1 promoter (Zheng et a/., Mot. Cell Biol. 7.?: 5829-5842 (1993), 
herein incorporated by reference in its entirety). Examples of promoters suitable for expression 
in wheat include those promoters for the ADPglucose pyrosynthase (ADPGPP) subunits, the 
granule bound and other starch synthase, the branching and debranching enzymes, the 
embryogenesis-abundant proteins, the gliadins and the glutenins. Examples of such promoters in 

10 rice include those promoters for the ADPGPP subunits, the granule bound and other starch 

synthase, the branching enzymes, the debranching enzymes, sucrose synthases and the glutelins. 
A particularly preferred promoter is the promoter for rice glutelin, Osgt- 1 . Examples of such 
promoters for barley include those for the ADPGPP subunits, the granule bound and other starch 
synthase, the branching enzymes, the debranching enzymes, sucrose synthases, the hordeins, the 

15 embryo globulins and the aleurone specific proteins. 

Root specific promoters may also be used. An example of such a promoter is the 
promoter for the acid chitinase gene (Samac et al, Plant Mol. Biol. 25:587-596 (1994), the 
entirety of which is herein incorporated by reference). Expression in root tissue could also be 
accomplished by utilizing the root specific subdomains of the CaMV35S promoter that have 

20 been identified (Lam et «/., Proc. Natl. Acad. Sci. (U.S.A.) 56:7890-7894 ( 1989), herein 
incorporated by reference in its entirety). Other root cell specific promoters include those 
reported by Conkling et al (Conkling et a/., Plant Physiol. 93: 1203-121 1 (1990), the entirety of 
which is herein incorporated by reference). 

Additional promoters that may be utilized are described, for example, in U.S. Patent Nos. 

25 5,378,619; 5,391,725; 5,428,147; 5,447,858; 5,608,144; 5,608,144; 5,614,399; 5,633,441; 

5,633,435; and 4,633,436, all of which are herein incoiporated in their entirety. In addition, a 
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tissue specific enhancer may be used (Fromm et al, The Plant Cell 7:977-984 (1989), the 

entirety of which is herein incorporated by reference). 

Constructs or vectors may also include, with the coding region of interest, a nucleic acid 

sequence that acts, in whole or in part, to terminate transcription of that region. A number of 
5 such sequences have been isolated, including the Tr7 3' sequence and the NOS 3' sequence 

(Ingelbrecht et al, The Plant Cell 7:671-680 (1989), the entirety of which is herein incorporated 

by reference; Bevan et al, Nucleic Acids Res. 77:369-385 (1983), the entirety of which is herein 

incorporated by reference). 

A vector or construct may also include regulatory elements. Examples of such include 
10 the Adhintron 1 (Callis et al, Genes and Develop. 7: 1 183-1200 (1987), the entirety of which is 

herein incorporated by reference), the sucrose synthase intron (Vasil et al, Plant Physiol. 

91:1575-1579 (1989), the entirety of which is herein incorporated by reference) and the TMV 

omega element (Gallie et al, The Plant Cell 7:301-31 1 (1989), the entirety of which is herein 

incorporated by reference). These and other regulatory elements may be included when 
15 appropriate. 

A vector or construct may also include a selectable marker. Selectable markers may also 
be used to select for plants or plant cells that contain the exogenous genetic material. Examples 
of such include, but arc not limited to: a neo gene (Potrykus et al, Mol Gen. Genet. 799: 183- 
188 (1985), the entirety of which is herein incorporated by reference ), which codes for 

20 kanamycin resistance and can be selected for using kanamycin, G418, etc.; a bar gene which 
codes for bialaphos resistance; a mutant EPSP synthase gene (Hinchee et al, Bio/Technology 
6:915-922 (1988), the entirety of which is herein incorporated by reference) which encodes 
glyphosate resistance; a nitrilase gene which confers resistance to bromoxynil (Stalker et al, J. 
Biol Chem. 26^:6310-6314 (1988), the entirety of which is herein incorporated by reference); a 

25 mutant acetolactate synthase gene (ALS) which confers imidazolinone or sulphonylurea 

resistance (European Patent Application 154.204 (Sept. 11, 1985), the entirety of which is herein 




incorporated by reference); and a methotrexate resistant DHFR gene (Thillet et aL, J. Biol Chem. 
263: 12500-12508 (1988), the entirety of which is herein incorporated by reference). 

A vector or construct may also include a transit peptide. Incorporation of a suitable 
chloroplast transit peptide may also be employed (European Patent Application Publication 
5 Number 0218571, the entirety of which is herein incorporated by reference). Translational 
enhancers may also be incorporated as part of the vector DNA. DNA constructs could contain 
one or more 5' non-translated leader sequences which may serve to enhance expression of the 
gene products from the resulting mRNA transcripts. Such sequences may be derived from the 
promoter selected to express the gene or can be specifically modified to increase translation of 
10 the mRNA. Such regions may also be obtained from viral RNAs, from suitable eukaryotic genes, 
or from a synthetic gene sequence. For a review of optimizing expression of transgenes, see 
Koziel et al, Plant Mol Biol 32:393-405 (1996), the entirety of which is herein incorporated by 
reference. 

A vector or construct may also include a screenable marker. Screenable markers may be 
15 used to monitor expression. Exemplary screenable markers include: a p-glucuronidase or uidA 
gene (GUS) which encodes an enzyme for which various chromogenic substrates are known 
(Jefferson, Plant Mol Biol Rep. 5:387-405 (1987), the entirety of which is herein incorporated 
by reference; Jefferson et al, EMBOJ. 6. 3901-3907 (1987), the entirety of which is herein 
incorporated by reference); an R-locus gene, which encodes a product that regulates the 
20 production of anthocyanin pigments (red color) in plant tissues (Dellaporta et al, Stadler 

Symposium 77:263-282 (1988), the entirety of which is herein incorporated by reference); a (3- 
lactamase gene (Sutcliffe et al, Proc. Natl Acad. Sci. (U.S.A.) 75:3737-3741 (1978), the entirety 
of which is herein incorporated by reference), a gene which encodes an enzyme for which various 
chromogenic substrates are known (e.g.. PADAC, a chromogenic cephalosporin); a lucifcrase 
25 gene (Ow et al, Science 234:856-859 (1986), the entirety of which is herein incoiporated by 
reference); a xylE gene (Zukowsky et al. Proc. Natl Acad Sci. (U.S.A.) SO: 1 101-1 105 (1983), 
the entirety of which is herein incoiporated by reference) which encodes a catechol dioxygenase 




that can convert chromogenic catechols; an a-amylase gene (Ikatu et al., Bio/Technol 5:241-242 
(1990), the entirety of which is herein incorporated by reference); a tyrosinase gene (Katz et a/., 
J. Gen. Microbiol. 729:2703-2714 (1983), the entirety of which is herein incorporated by 
reference) w hich encodes an enzyme capable of oxidizing tyrosine to DOPA and dopaquinone 
5 which in turn condenses to melanin; an a-galactosidase, which will turn a chromogenic a- 
galactose substrate. 

Included within the terms "selectable or screenable marker genes" are also genes which 
encode a secretable marker whose secretion can be detected as a means of identifying or selecting 
for transformed cells. Examples include markers which encode a secretable antigen that can be 

10 identified by antibody interaction, or even secretable enzymes which can be detected 

catalytically. Secretable proteins fall into a number of classes, including small, diffusible 
proteins which are detectable, {e.g., by ELISA), small active enzymes which are detectable in 
extracellular solution (e.g., a-amylase, P-lactamase, phosphinothricin transferase), or proteins 
which are inserted or trapped in the cell wall (such as proteins which include a leader sequence 

15 such as that found in the expression unit of extension or tobacco PR-S). Other possible 
selectable and/or screenable marker genes will be apparent to those of skill in the art. 

There are many methods for introducing transforming nucleic acid molecules into plant 
cells. Suitable methods are believed to include virtually any method by which nucleic acid 
molecules may be introduced into a cell, such as by Agrohacteriwn infection or direct delivery of 

20 nucleic acid molecules such as, for example, by PEG-mediated transformation, by 

electroporation or by acceleration of DNA coated particles, etc (Potrykus, Ann. Rev. Plant 
Physiol. Plant Mol Biol. 42:205-225 (1991), the entirety of which is herein incorporated by 
reference; Vasil, Plant Mol Biol 25:925-937 (1994), the entirety of which is herein incorporated 
by reference). For example, electroporation has been used to transform maize protoplasts 

25 (Fromm et al. Nature 312:191-193 (1986), the entirety of which is herein incorporated by 
reference). 
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Other vector systems suitable for introducing transforming DNA into a host plant cell 
include but are not limited to binary artificial chromosome (BIBAC) vectors (Hamilton et ai. 
Gene 2(90:107-1 16 (1997), the entirety of which is herein incorporated by reference); and 
transfection with RNA viral vectors (Della-Cioppa et aL Ann. N.Y. Acad. Sci. (1996), 792 
5 (Engineering Plants for Commercial Products and Applications), 57-6 1 , the entirety of which is 
herein incorporated by reference). Additional vector systems also include plant selectable YAC 
vectors such as those described in Mullen et a/., Molecular Breeding ^:449-457 (1988), the 
entirety of which is herein incorporated by reference). 

Technology for introduction of DNA into cells is well known to those of skill in the art. 

10 Four general methods for delivering a gene into cells have been described: (1) chemical methods 
(Graham and van der Eb, Virology 54:536-539 (1973), the entirety of which is herein 
incorporated by reference); (2) physical methods such as microinjection (Capecchi, Cell 22:479- 
488 (1980), the entirety of which is herein incorporated by reference), electroporation (Wong and 
Neumann, Biochem. Biophys. Res. Commun. 707:584-587 (1982); Frornm et aL Proc. Natl 

15 Acad. Sci. (U.S.A.) 82:5824-5828 (1985); U.S. Patent No. 5,384,253, all of which are herein 

incorporated in their entirety); and the gene gun (Johnston and Tang, Methods Cell Biol. 43:353- 
365 (1994), the entirety of which is herein incorporated by reference); (3 ) viral vectors (Clapp, 
Clin. Perinatal. 20:155-168 (1993); Lu et al., J. Exp. Med. 778:2089-2096 (1993); Eglitis and 
Anderson, Biotechniques 6:608-614 (1988), all of which are herein incorporated in their 

20 entirety); and (4) receptor-mediated mechanisms (Curiel et al, Hum. Gen. Ther. J: 147- 154 
(1992), Wagner et aL Proc. Natl. Acad. Sci. (USA) 89:6099-6103 (1992), both of which are 
incorporated by reference in their entirety). 

Acceleration methods that may be used include, for example, microprojectile 
bombardment and the like. One example of a method for delivering transforming nucleic acid 

25 molecules to plant cells is microprojectile bombardment. This method has been reviewed by 
Yang and Chnstou (eds.). Particle Bombardment Technology for Gene Transfer, Oxford Press, 
Oxford, England ( 1994), the entirety of which is herein incorporated by reference). Non- 




biological particles (microprojectiles) that may be coated with nucleic acids and delivered into 
cells by a propelling force. Exemplary particles include those comprised of tungsten, gold, 
platinum and the like. 

A particular advantage of microprojectile bombardment, in addition to it being an 
5 effective means of reproducibly transforming monocots, is that neither the isolation of 

protoplasts (Cristou et al, Plant Physiol #7:671-674 (1988), the entirety of which is herein 
incorporated by reference) nor the susceptibility of Agrobacterium infection are required. An 
illustrative embodiment of a method for delivering DNA into maize cells by acceleration is a 
biolistics cc-particle delivery system, which can be used to propel particles coated with DNA 

10 through a screen, such as a stainless steel or Nytex screen, onto a filter surface covered with 
maize cells cultured in suspension. Gordon-Kamm et al y describes the basic procedure for 
coating tungsten particles with DNA (Gordon-Kamm et ai, Plant Cell 2:603-618 (1990), the 
entirety of which is herein incorporated by reference). The screen disperses the tungsten nucleic 
acid particles so that they are not delivered to the recipient cells in large aggregates. A particle 

15 delivery system suitable for use with the invention is the helium acceleration PDS-1000/He gun 
is available from Bio-Rad Laboratories (Bio-Rad, Hercules, California)(Sanford et aL, Technique 
3:3-16 (1991), the entirety of which is herein incorporated by reference). 

For the bombardment, cells in suspension may be concentrated on filters. Filters 
containing the cells to be bombarded are positioned at an appropriate distance below the 

20 microprojectile stopping plate. If desired, one or more screens are also positioned between the 
gun and the cells to be bombarded. 

Alternatively, immature embryos or other target cells may be arranged on solid culture 
medium. The cells to be bombarded are positioned at an appropriate distance below the 
microprojectile stopping plate. If desired, one or more screens are also positioned between the 

25 acceleration device and the cells to be bombarded. Through the use of techniques set forth herein 
one may obtain up to 1000 or more foci of cells transiently expressing a marker gene. The 




number of cells in a focus which express the exogenous gene product 48 hours post- 
bombardment often range from one to ten and average one to three. 

In bombardment transformation, one may optimize the pre-bombardment culturing 
conditions and the bombardment parameters to yield the maximum numbers of stable 
5 transformants. Both the physical and biological parameters for bombardment are important in 
this technology. Physical factors are those that involve manipulating the DNA/microprojectile 
precipitate or those that affect the flight and velocity of either the macro- or microprojectiles. 
Biological factors include all steps involved in manipulation of cells before and immediately 
after bombardment, the osmotic adjustment of target cells to help alleviate the trauma associated 

10 with bombardment and also the nature of the transforming DNA, such as linearized DNA or 
intact supercoiled plasmids. It is believed that pre-bombardment manipulations are especially 
important for successful transformation of immature embryos. 

In another alternative embodiment, plastids can be stably transformed. Methods 
disclosed for plastid transformation in higher plants include the particle gun delivery of DNA 

1 5 containing a selectable marker and targeting of the DNA to the plastid genome through 

homologous recombination (Svab et al t Proc. Natl Acad. ScL (U.S.A.) 87:8526-8530 (1990); 
Svab and Maliga, Proc. Natl Acad. Sci. (U.S.A.) 90:913-917 (1993); Staub and Maliga, EMBO 
J. 72:601-606 (1993); U.S. Patents 5, 451,513 and 5,545,818, all of which are herein 
incorporated by reference in their entirety). 

20 Accordingly, it is contemplated that one may wish to adjust various aspects of the 

bombardment parameters in small scale studies to fully optimize the conditions. One may 
particularly wish to adjust physical parameters such as gap distance, flight distance, tissue 
distance and helium pressure. One may also minimize the trauma reduction factors by modifying 
conditions which influence the physiological state of the recipient cells and which may therefore 

25 influence transformation and integration efficiencies. For example, the osmotic state, tissue 
hydration and the subculture stage or cell cycle of the recipient cells may be adjusted for 
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optimum transformation. The execution of other routine adjustments will be known to those of 
skill in the art in light of the present disclosure. 

Agrobacterium-mcdidicd transfer is a widely applicable system for introducing genes into 
plant cells because the DNA can be introduced into whole plant tissues, thereby bypassing the 
5 need for regeneration of an intact plant from a protoplast. The use of Agrobacterium-med\dled 
plant integrating vectors to introduce DNA into plant cells is well known in the art. See, for 
example the methods described by Fraley et ai, Bio/Technology 3:629-635 (1985) and Rogers et 
ai, Methods Enzymol. 753:253-277 (1987), both of which are herein incorporated by reference in 
their entirety. Further, the integration of the Ti-DNA is a relatively precise process resulting in 

10 few rearrangements. The region of DNA to be transferred is defined by the border sequences and 
intervening DNA is usually inserted into the plant genome as described (Spielmann et al., Mol 
Gen. Genet. 205:34 (1986), the entirety of which is herein incorporated by reference). 

Modern Agrobacterium transformation vectors are capable of replication in E. coli as 
well as Agrobacterium, allowing for convenient manipulations as described (Klee et ai, In: Plant 

15 DNA Infectious Agents, Hohn and Schell (eds.), Springer- Verlag, New York, pp. 179-203 (1985), 
the entirety of which is herein incorporated by reference). Moreover, technological advances in 
vectors for Agrobacterium-medvdied gene transfer have improved the arrangement of genes and 
restriction sites in the vectors to facilitate construction of vectors capable of expressing various 
polypeptide coding genes. The vectors described have convenient multi-linker regions flanked 

20 by a promoter and a polyadenylation site for direct expression of inserted polypeptide coding 
genes and are suitable for present purposes (Rogers et ai, Methods Enzymol. 153:253-211 
(1987)). In addition, Agrobacterium containing both armed and disarmed Ti genes can be used 
for the transformations. In those plant strains where Agrobacterium-medvdied transformation is 
efficient, it is the method of choice because of the facile and defined nature of the gene transfer. 

25 A transgenic plant formed using Agrobacterium transformation methods typically 

contains a single gene on one chromosome. Such transgenic plants can be referred to as being 
heterozygous for the added gene. More preferred is a transgenic plant that is homozygous for the 
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added structural gene; i.e., a transgenic plant that contains two added genes, one gene at the same 
locus on each chromosome of a chromosome pair. A homozygous transgenic plant can be 
obtained by sexually mating (selfing) an independent segregant transgenic plant that contains a 
single added gene, germinating some of the seed produced and analyzing the resulting plants 
5 produced for the gene of interest. 

It is also to be understood that two different transgenic plants can also be mated to 
produce offspring that contain two independently segregating, exogenous genes. Selfing of 
appropriate progeny can produce plants that are homozygous for both added, exogenous genes 
that encode a polypeptide of interest. Back-crossing to a parental plant and out-crossing with a 

10 non-transgenic plant are also contemplated, as is vegetative propagation. 

Transformation of plant protoplasts can be achieved using methods based on calcium 
phosphate precipitation, polyethylene glycol treatment, electroporation and combinations of these 
treatments (See, for example, Potrykus et al, Mol Gen. Genet. 205: 193-200 (1986); Lorz et al, 
Mol Gen. Genet. 799:178 (1985); Fromm et al, Nature 319:791 (1986); Uchimiya et al, Mol. 

15 Gen. Genet. 204:204 (1986); Marcotte et al, Nature 535:454-457 (1988), all of which are herein 
incorporated by reference in their entirety). 

Application of these systems to different plant strains depends upon the ability to 
regenerate that particular plant strain from protoplasts. Illustrative methods for the regeneration 
of cereals from protoplasts are described (Fujimura et al, Plant Tissue Culture Letters 2:14 

20 (1985); Toriyama et al., TheorAppl. Genet. 205:34 (1986); Yamada et al., Plant Cell Rep. 4:85 
(1986 ); Abdullah et al, Biotechnolog. 4:1087 (1986), all of which are herein incorporated by 
reference in their entirety). 

To transform plant strains that cannot be successfully regenerated from protoplasts, other 
ways to introduce DNA into intact cells or tissues can be utilized. For example, regeneration of 

25 cereals from immature embryos or explants can be effected as described (Vasil, Biotechnology 
6:397 (1988). the entirety of which is herein incorporated by reference). In addition, "particle 
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gun" or high-velocity microprojectile technology can be utilized (Vasil et aL, Bio/T echnology 
10:661 (1992), the entirety of which is herein incorporated by reference). 

Using the latter technology, DNA is carried through the cell wall and into the cytoplasm 
on the surface of small metal particles as described (Klein et aL, Nature 328:70 (1987); Klein et 
5 aL, Proc. NatL Acad. Sci. (U.S.A.) 55:8502-8505 (1988); McCabe et aL, Bio/Technology 6:923 
(1988), all of which are herein incorporated by reference in their entirety). The metal particles 
penetrate through several layers of cells and thus allow the transformation of cells within tissue 
explants. 

Other methods of cell transformation can also be used and include but are not limited to 

10 introduction of DNA into plants by direct DNA transfer into pollen (Hess et aL, Intern Rev. 

Cytol. 107:361 (1987); Luo et aU Plant Mol Biol. Reporter 6:165 (1988), all of which are herein 
incorporated by reference in their entirety ), by direct injection of DNA into reproductive organs 
of a plant (Pena et aL, Nature 325:214 (1987), the entirety of which is herein incorporated by 
reference), or by direct injection of DNA into the cells of immature embryos followed by the 

15 rehydration of desiccated embryos (Neuhaus et aL, Theor. AppL Genet. 75:30 (1987), the entirety 
of which is herein incoiporated by reference). 

The regeneration, development and cultivation of plants from single plant protoplast 
transformants or from various transformed explants is well known in the art (Weissbach and 
Weissbach, In: Methods for Plant Molecular Biology, Academic Press, San Diego, CA, (1988), 

20 the entirety of which is herein incorporated by reference). This regeneration and growth process 
typically includes the steps of selection of transformed cells, cultunng those individualized cells 
through the usual stages of embryonic development through the rooted plantlet stage. Transgenic 
embryos and seeds are similarly regenerated. The resulting transgenic rooted shoots are 
thereafter planted in an appropriate plant growth medium such as soil. 

25 The development or regeneration of plants containing the foreign, exogenous gene that 

encodes a protein of interest is well known in the ail. Preferably, the regenerated plants are self- 
pollinated to provide homozygous transgeni: plants. Otherwise, pollen obtained from the 
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regenerated plants is crossed to seed-grown plants of agronomically important lines. Conversely, 
pollen from plants of these important lines is used to pollinate regenerated plants. A transgenic 
plant of the invention containing a desired polypeptide is cultivated using methods well known to 
one skilled in the art. 

5 There are a variety of methods for the regeneration of plants from plant tissue. The 

particular method of regeneration will depend on the starting plant tissue and the particular plant 
species to be regenerated. 

Methods for transforming dicots, primarily by use of Agrobacterium tumefaciens and 
obtaining transgenic plants have been published for cotton (U.S. Patent No. 5,004,863; U.S. 

10 Patent No. 5,159,135; U.S. Patent No. 5,518,908, all of which are herein incorporated by 
reference in their entirety); soybean (U.S. Patent No. 5,569,834; U.S. Patent No. 5,416,01 1; 
McCabe et al, Biotechnology 6:923 (1988); Christou et al, Plant Physiol 87:611-614 (1988); 
all of which are herein incorporated by reference in their entirety); Brassica (U.S. Patent No. 
5,463,174, the entirety of which is herein incorporated by reference); peanut (Cheng et al, Plant 

15 Cell Rep. 75:653-657 (1996), McKently et ai, Plant Cell Rep. 14:699-103 (1995), all of which 
are herein incorporated by reference in their entirety); papaya; and pea (Grant et ai, Plant Cell 
Rep. 75:254-258 (1995), the entirety of which is herein incorporated by reference). 

Transformation of monocotyledons using electroporation, particle bombardment and 
Agrobacterium have also been reported. Transformation and plant regeneration have been 

20 achieved in asparagus (Bytebier et al, Proc. Natl Acad. Sci. (USA) 84:5354 (1987), the entirety 
of which is herein incorporated by reference); barley (Wan and Lemaux, Plant Physiol 104:31 
(1994), the entirety of which is herein incorporated by reference); maize (Rhodes et al, Science 
240:204 (1988); Gordon-Kamm et al, Plant Cell 2:603-618 (1990); Fromm et ai, 
Biotechnology #833 (1990); Koziel et al, Bio/Teehnology 77: 194 (1993); Armstrong et al, 

25 Crop Science J5:550-557 (1995); all of which are herein incorporated by reference in their 
entirety); oat (Somers et al, Bio/Technology 10: 1589 (1992), the entirety of which is herein 
incorporated by reference); orchard gra i s (Horn et al, Plant Cell Rep. 7:469 (1988), the entirety 
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of which is herein incorporated by reference); rice (Toriyama et al, The or Appl. Genet. 205:34 
(1986); Part et al, Plant Mol Biol J?2:l 135-1 148 (1996); Abedinia et al, Aust. J. Plant Physiol. 
24:133-141 (1997); Zhang and Wu, Theor. Appl Genet. 76:835 (1988); Zhang et al, Plant Cell 
Rep. 7:379 (1988); Battraw and Hall, Plant Sci. 86:191-202 (1992); Christou et al, 
5 Bio/Technology 9:957 (1991), all of which are herein incorporated by reference in their entirety); 
rye (De la Pena et al, Nature 325:214 (1987), the entirety of which is herein incorporated by 
reference); sugarcane (Bower and Birch, Plant J. 2:409 (1992), the entirety of which is herein 
incorporated by reference); tall fescue (Wang et al, Bio/Technology 70:691 (1992), the entirety 
of which is herein incorporated by reference) and wheat (Vasil et al, Bio/Technology 70:667 

10 ( 1992), the entirety of which is herein incorporated by reference; U.S. Patent No. 5,631,152, the 
entirety of which is herein incorporated by reference.) 

Assays for gene expression based on the transient expression of cloned nucleic acid 
constructs have been developed by introducing the nucleic acid molecules into plant cells by 
polyethylene glycol treatment, electroporation, or particle bombardment (Marcotte et al, Nature 

15 JJ5:454-457 (1988), the entirety of which is herein incorporated by reference; Marcotte et al, 
Plant Cell 7:523-532 (1989), the entirety of which is herein incorporated by reference; McCarty 
et al. Cell 66:895-905 (1991), the entirety of which is herein incorporated by reference; Hattori 
et al, Genes Dev. 6:609-618 (1992), the entirety of which is herein incorporated by reference; 
Goff et al, EMBO J. 9:2517-2522 (1990), the entirety of which is herein incorporated by 

20 reference). Transient expression systems may be used to functionally dissect gene constructs (see 
generally, Mailga et al, Methods in Plant Molecular Biology, Cold Spring Harbor Press (1995)). 

Any of the nucleic acid molecules of the invention may be introduced into a plant cell in a 
permanent or transient manner in combination with other genetic elements such as vectors, 
promoters, enhancers, etc. Further, any of the nucleic acid molecules of the invention may be 

25 introduced into a plant cell in a manner tha! allows for overexpression of the protein or fragment 
thereof encoded by the nucleic acid moled 1 e. 
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Cosuppression is the reduction in expression levels, usually at the level of RNA, of a 
particular endogenous gene or gene family by the expression of a homologous sense construct 
that is capable of transcribing mRNA of the same strandedness as the transcript of the 
endogenous gene (Napoli et aL, Plant Cell 2:279-289 (1990), the entirety of which is herein 
5 incorporated by reference; van der Krol et aL, Plant Cell 2:291-299 (1990), the entirety of which 
is herein incorporated by reference). Cosuppression may result from stable transformation with a 
single copy nucleic acid molecule that is homologous to a nucleic acid sequence found with the 
cell (Prolls and Meyer, Plant J. 2:465-475 (1992), the entirety of which is herein incorporated by 
reference) or with multiple copies of a nucleic acid molecule that is homologous to a nucleic acid 

10 sequence found with the cell (Mittlesten et aL, MoL Gen. Genet. 244:325-330 (1994), the 

entirety of which is herein incorporated by reference). Genes, even though different, linked to 
homologous promoters may result in the cosuppression of the linked genes (Vaucheret, C.R. 
Acad. Sci. Ill 3/6:1471-1483 (1993), the entirety of which is herein incorporated by reference). 

This technique has, for example, been applied to generate white flowers from red petunia 

15 and tomatoes that do not ripen on the vine. Up to 50% of petunia transformants that contained a 
sense copy of the glucoamylase (CHS) gene produced white flowers or floral sectors; this was as 
a result of the post-transcriptional loss of mRNA encoding CHS (Flavell, Proc. Natl. Acad. Sci. 
(U.S.A.) 9/:3490-3496 (1994), the entirety of which is herein incorporated by reference); van 
Blokland et aL, Plant J. 6:861-877 (1994), the entirety of which is herein incorporated by 

20 reference). Cosuppression may require the coordinate transcription of the transgene and the 
endogenous gene and can be reset by a developmental control mechanism (Jorgensen, Trends 
BiotechnoL <S:340-344 (1990), the entirety of which is herein incorporated by reference; Meins 
and Kunz, In: Gene Inactivation and Homologous Recombination in Plants, Paszkowski (ed.), 
pp. 335-348, Kluwer Academic, Netherlands (1994), the entirety of which is herein incorporated 

25 by reference). 
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It is understood that one or more of the nucleic acids of the invention may be introduced 
into a plant cell and transcribed using an appropriate promoter with such transcription resulting 
in the cosuppression of an endogenous protein. 

Antisense approaches are a way of preventing or reducing gene function by targeting the 
5 genetic material (Mol et al., FEBS Lett. 265:427-430 (1990), the entirety of which is herein 
incorporated by reference). The objective of the antisense approach is to use a sequence 
complementary to the target gene to block its expression and create a mutant cell line or 
organism in which the level of a single chosen protein is selectively reduced or abolished. 
Antisense techniques have several advantages over other 'reverse genetic' approaches. The site 

10 of inactivation and its developmental effect can be manipulated by the choice of promoter for 
antisense genes or by the timing of external application or microinjection. Antisense can 
manipulate its specificity by selecting either unique regions of the target gene or regions where it 
shares homology to other related genes (Hiatt et al. t In: Genetic Engineering, Setlow (ed.), Vol. 
1 1, New York: Plenum 49-63 (1989), the entirety of which is herein incorporated by reference). 

15 The principle of regulation by antisense RNA is that RNA that is complementary to the 

target mRNA is introduced into cells, resulting in specific RNA:RNA duplexes being formed by 
base pairing between the antisense substrate and the target mRNA (Green et al.,Annu. Rev. 
Biochem. 55:569-597 (1986), the entirety of which is herein incorporated by reference). Under 
one embodiment, the process involves the introduction and expression of an antisense gene 

20 sequence. Such a sequence is one in which part or all of the normal gene sequences are placed 
under a promoter in inverted orientation so that the 'wrong' or complementary strand is 
transcribed into a noncoding antisense RNA that hybridizes with the target mRNA and interferes 
with its expression (Takayama and Inouye, Crit. Rev. Biochem. Mol. Biol. 25:155-184 (1990), the 
entirety of which is herein incorporated by reference). An antisense vector is constructed by 

25 standard procedures and introduced into cells by transformation, transfection, electroporation, 
microinjection, infection, etc. The type )f transformation and choice of vector will determine 
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whether expression is transient or stable. The promoter used for the antisense gene may 
influence the level, timing, tissue, specificity, or inducibi lity of the antisense inhibition. 

It is understood that the activity of a protein in a plant cell may be reduced or depressed 
by growing a transformed plant cell containing a nucleic acid molecule of the present invnetion 
5 whose non-transcribed strand encodes a protein or fragment thereof. 

Antibodies have been expressed in plants (Hiatt et al., Nature 342:16-18 (1989), the 
entirety of which is herein incorporated by reference; Conrad and Fielder, Plant Mol. Biol. 
26:1023-1030 (1994), the entirety of which is herein incorporated by reference). Cytoplasmic 
expression of a scFv (single-chain Fv antibodies) has been reported to delay infection by 

10 artichoke mottled crinkle virus. Transgenic plants that express antibodies directed against 

endogenous proteins may exhibit a physiological effect (Philips et al, EMBO J. 76:4489-4496 
(1997), the entirety of which is herein incorporated by reference; Marion-Poll, Trends in Plant 
Science 2:441-448 (1997), the entirety of which is herein incorporated by reference). For 
example, expressed anti-abscissic antibodies have been reported to result in a general 

15 perturbation of seed development (Philips et aL EMBO J. 16: 4489-4496 (1997)). 

Antibodies that are catalytic may also be expressed in plants (abzymes). The principle 
behind abzymes is that since antibodies may be raised against many molecules, this recognition 
ability can be directed toward generating antibodies that bind transition states to force a chemical 
reaction forward (Persidas, Nature Biotechnology 75:1313-1315 (1997), the entirety of which is 

20 herein incorporated by reference; Baca et al, Ann. Rev. Biophys. Biomol Struct. 26:461-493 
(1997), the entirety of which is herein incorporated by reference). The catalytic abilities of 
abzymes may be enhanced by site directed mutagenesis. Examples of abzymes are, for example, 
set forth in U.S. Patent No: 5,658,753; U.S. Patent No. 5,632,990; U.S. Patent No. 5,631,137; 
U.S. Patent 5,602,015; U.S. Patent No. 5,559,538; U.S. Patent No. 5,576,174; U.S. Patent No. 

25 5,500,358; U.S. Patent 5,318,897; U.S. Patent No. 5,298,409; U.S. Patent No. 5,258,289 and 
U.S. Patent No. 5,194,585, all of which are herein incorporated in their entirety. 
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It is understood that any of the antibodies of the invention may be expressed in plants and 
that such expression can result in a physiological effect. It is also understood that any of the 
expressed antibodies may be catalytic. 

(d) Antibodies 

5 One aspect of the invention concerns antibodies, single-chain antigen binding molecules, 

or other proteins that specifically bind to one or more of the protein or peptide molecules of the 
invention and their homologues, fusions or fragments. In a preferred embodiment, an antibody 
of the present invention binds to an amino acid selected from the group consisting of SEQ ID 
NO: 622 through 625. Such antibodies may be used to quantitatively or qualitatively detect the 

10 protein or peptide molecules of the invention. As used herein, an antibody or peptide is said to 
"specifically bind" to a protein or peptide molecule of the invention if such binding is not 
competitively inhibited by the presence of non-related molecules. 

Nucleic acid molecules that encode all or part of the protein of the invention can be 
expressed, via recombinant means, to yield protein or peptides that can in turn be used to elicit 

15 antibodies that are capable of binding the expressed protein or peptide. Such antibodies may be 
used in immunoassays for that protein. Such protein-encoding molecules, or their fragments may 
be a "fusion" molecule (i.e., a part of a larger nucleic acid molecule) such that, upon expression, 
a fusion protein is produced. It is understood that any of the nucleic acid molecules of the 
invention may be expressed, via recombinant means, to yield proteins or peptides encoded by 

20 these nucleic acid molecules. 

The antibodies that specifically bind proteins and protein fragments of the invention may 
be polyclonal or monoclonal and may comprise intact immunoglobulins, or antigen binding 
portions of immunoglobulins fragments (such as (F(ab'), F(ab')2), or single-chain 

immunoglobulins producible, for example, via recombinant means. It is understood that 
25 practitioners are familiar with the standard resource materials which describe specific conditions 
and procedures for the construction, manipulation and isolation of antibodies (see, for example. 



Harlow and Lane, In: Antibodies: A Laboratory Manual, Cold Spring Harbor Press, Cold Spring 
Harbor, New York (1988), the entirety of which is herein incorporated by reference). 

Murine monoclonal antibodies are particularly preferred. BALB/c mice are preferred for 
this purpose, however, equivalent strains may also be used. The animals are preferably 
5 immunized with approximately 25 jag of purified protein (or fragment thereof) that has been 
emulsified in a suitable adjuvant (such as TiterMax adjuvant (Vaxcel, Norcross, GA)). 
Immunization is preferably conducted at two intramuscular sites, one intraperitoneal site and one 
subcutaneous site at the base of the tail. An additional i.v. injection of approximately 25 |jg of 
antigen is preferably given in normal saline three weeks later. After approximately 1 1 days 

10 following the second injection, the mice may be bled and the blood screened for the presence of 
anti-protein or peptide antibodies. Preferably, a direct binding Enzyme-Linked Immunoassay 
(ELISA) is employed for this purpose. 

More preferably, the mouse having the highest antibody titer is given a third i.v. injection 
of approximately 25 |ig of the same protein or fragment. The splenic leukocytes from this animal 

15 may be recovered 3 days later and then permitted to fuse, most preferably, using polyethylene 
glycol, with cells of a suitable myeloma cell line (such as, for example, the P3X63Ag8.653 
myeloma cell line). Hybridoma cells are selected by culturing the cells under "HAT" 
(hypoxanthine-aminopterin-thymine) selection for about one week. The resulting clones may 
then be screened for their capacity to produce monoclonal antibodies ("mAbs"), preferably by 

20 direct ELISA. 

In one embodiment, anti-protein or peptide monoclonal antibodies are isolated using a 
fusion of a protein or peptide of the invention, or conjugate of a protein or peptide of the 
invention, as immunogens. Thus, for example, a group of mice can be immunized using a fusion 
protein emulsified in Freund's complete adjuvant {e.g., approximately 50 jag of antigen per 
25 immunization). At three week intervals, an identical amount of antigen is emulsified in Freund's 
incomplete adjuvant and used to immunize the animals. Ten days following the third 
immunization, serum samples are taken ; nd evaluated for the presence of antibody. If antibody 
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titers are too low, a fourth booster can be employed. Polysera capable of binding the protein or 
peptide can also be obtained using this method. 

In a preferred procedure for obtaining monoclonal antibodies, the spleens of the above- 
described immunized mice are removed, disrupted and immune splenocytes are isolated over a 
5 ficoll gradient. The isolated splenocytes are fused, using polyethylene glycol with BALB/c- 
derived HGPRT (hypoxanthine guanine phosphoribosyl transferase) deficient P3x63xAg8.653 
plasmacytoma cells. The fused cells are plated into 96 well microtiter plates and screened for 
hybridoma fusion cells by their capacity to grow in culture medium supplemented with 
hypothanthine, aminopterin and thymidine for approximately 2-3 weeks. 

10 Hybridoma cells that arise from such incubation are preferably screened for their capacity 

to produce an immunoglobulin that binds to a protein of interest. An indirect ELISA may be 
used for this purpose. In brief, the supernatants of hybridomas are incubated in microtiter wells 
that contain immobilized protein. After washing, the titer of bound immunoglobulin can be 
determined using, for example, a goat anti-mouse antibody conjugated to horseradish peroxidase. 

15 After additional washing, the amount of immobilized enzyme is determined (for example 

through the use of a chromogenic substrate). Such screening is performed as quickly as possible 
after the identification of the hybridoma in order to ensure that a desired clone is not overgrown 
by non-secreting neighbor cells. Desirably, the fusion plates are screened several times since the 
rates of hybridoma growth vary. In a preferred sub-embodiment, a different antigenic form may 

20 be used to screen the hybridoma. Thus, for example, the splenocytes may be immunized with 

one immunogen, but the resulting hybridomas can be screened using a different immunogen. It is 
understood that any of the protein or peptide molecules of the invention may be used to raise 
antibodies. 

As discussed below, such antibody molecules or their fragments may be used for 
25 diagnostic purposes. Where the antibodies are intended for diagnostic purposes, it may be 
desirable to derivatize them, for example with a ligand group (such as biotin) or a detectable 
marker group (such as a fluorescent group, a radioisotope or an enzyme). 
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The ability to produce antibodies that bind the protein or peptide molecules of the 
invention permits the identification of mimetic compounds derived from those molecules. These 
mimetic compounds may contain a fragment of the protein or peptide or merely a structurally 
similar region and nonetheless exhibits an ability to specifically bind to antibodies directed 
5 against that compound. 

It is understood that any of the agents of the invention can be substantially purified and/or 
be biologically active and/or recombinant. 

(e) Exemplary Uses 

Nucleic acid molecules and fragments thereof of the invention may be employed to obtain 
10 other nucleic acid molecules from the same species (nucleic acid molecules from maize may be 
utilized to obtain other nucleic acid molecules from maize). Such nucleic acid molecules include 
the nucleic acid molecules that encode the complete coding sequence of a protein and promoters 
and flanking sequences of such molecules. In addition, such nucleic acid molecules include 
nucleic acid molecules that encode for other isozymes or gene family members. Such molecules 
15 can be readily obtained by using the above-described nucleic acid molecules or fragments thereof 
to screen cDNA or genomic libraries. Methods for forming such libraries are well known in the 
art. 

Nucleic acid molecules and fragments thereof of the invention may also be employed to 
obtain nucleic acid homologs. Such homologs include the nucleic acid molecule of other plants 

20 or other organisms (e.g., alfalfa, Arabidopsis, barley, Brassica, broccoli, cabbage, citrus, cotton, 
garlic, oat, oilseed rape, onion, canola, flax, an ornamental plant, pea, peanut, pepper, potato, 
rice, rye, sorghum, strawberry, sugarcane, sugarbeet, tomato, wheat, poplar, pine, fir, eucalyptus, 
apple, lettuce, lentils, grape, banana, tea, turf grasses, sunflower, oil palm, Phaseolus, etc.) 
including the nucleic acid molecules that encode, in whole or in part, protein homologs of other 

25 plant species or other organisms, sequences of genetic elements, such as promoters and 

transcriptional regulatory elements. Particularly preferred plants are selected from the group 
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consisting of maize, canola, soybean, crambe, mustard, castor bean, peanut, sesame, cottonseed, 
linseed, safflower, oil palm, flax and sunflower. 

Such molecules can be readily obtained by using the above-described nucleic acid 
molecules or fragments thereof to screen cDNA or genomic libraries obtained from such plant 
5 species. Methods for forming such libraries are well known in the art. Such homolog molecules 
may differ in their nucleotide sequences from those found in one or more of SEQ ID NOs: 1-4, 6- 
29 or complements thereof because complete complementarity is not needed for stable 
hybridization. The nucleic acid molecules of the invention therefore also include molecules that, 
although capable of specifically hybridizing with the nucleic acid molecules may lack "complete 

10 complementarity." 

Any of a variety of methods may be used to obtain one or more of the above-described 
nucleic acid molecules (Zamechik et ai, Proc. Natl. Acad. Sci. (U.S.A.) 55:4143-4146 (1986); 
Goodchild et ai, Proc. Natl. Acad. Sci. (U.S.A.) 55:5507-551 1 (1988); Wickstrom et ai, Proc. 
Natl. Acad. Sci. (U.S.A.) 85:1028-1032 (1988); Holt et ai, Molec. Cell. Biol. 8:963-973 (1988); 

15 Gerwirtz et ai, Science 242: 1303-1306 (1988); Anfossi et ai, Proc. Natl. Acad. Sci. (U.S.A.) 
86:3379-3383 (1989); Becker et aU EMBO J. 8:3685-3691 (1989)). Automated nucleic acid 
synthesizers may be employed for this purpose. In lieu of such synthesis, the disclosed nucleic 
acid molecules may be used to define a pair of primers that can be used with the polymerase 
chain reaction (Mullis et ai, Cold Spring Harbor Symp. Quant. Biol. 57:263-273 (1986); Erlich 

20 et «/., European Patent 50,424; European Patent 84,796; European Patent 258,017; European 
Patent 237,362; Mullis, European Patent 201,184; Mullis et «/., U.S. Patent 4,683,202; Erlich, 
U.S. Patent 4,582,788; and Saiki et ai, U.S. Patent 4,683,194) to amplify and obtain any desired 
nucleic acid molecule or fragment. 

Promoter sequences and other genetic elements, including but not limited to 

25 transcriptional regulatory flanking sequences, associated with one or more of the disclosed 

nucleic acid sequences can also be obtained using the disclosed nucleic acid sequence provided 
herein. In one embodiment, such sequences are obtained by incubating nucleic acid molecules of 
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the present invention with members of genomic libraries and recovering clones that hybridize to 
such nucleic acid molecules thereof. In a second embodiment, methods of "chromosome 
walking/ 1 or inverse PCR may be used to obtain such sequences (Frohman et al, Proc. Natl 
Acad. Sci. (U.S.A.) 85:8998-9002 (1988); Ohara et al, Proc. Natl Acad. Sci. (U.S.A.) 86:5613- 
5 5677 (1989); Pang et a!. t Biotechniques 22:1046-1048 (1977); Huang et al., Methods Mol. Biol 
69:89-96 (1997); Huang et al, Method Mol Biol 67:287-294 (1997); Benkel et al, Genet. Anal 
A?; 123-127 (1996); Haiti et al, Methods Mol Biol. 55:293-301 (1996)). The term "chromosome 
walking" means a process of extending a genetic map by successive hybridization steps. 

The nucleic acid molecules of the invention may be used to isolate promoters of cell 

10 enhanced, cell specific, tissue enhanced, tissue specific, developmentally or environmentally 
regulated expression profiles. Isolation and functional analysis of the 5' flanking promoter 
sequences of these genes from genomic libraries, for example, using genomic screening methods 
and PCR techniques would result in the isolation of useful promoters and transcriptional 
regulatory elements. These methods are known to those of skill in the art and have been 

15 described (See, for example, Birren et al, Genome Analysis: Analyzing DNA, 1, (1997), Cold 
Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.). Promoters obtained utilizing the 
nucleic acid molecules of the invention could also be modified to affect their control 
characteristics. Examples of such modifications would include but are not limited to enhancer 
sequences. Such genetic elements could be used to enhance gene expression of new and existing 

20 traits for crop improvement. 

Another subset of the nucleic acid molecules of the invention includes nucleic acid 
molecules that are markers. The markers can be used in a number of conventional ways in the 
field of molecular genetics. Such markers include nucleic acid molecules SEQ ID NOs: 1-4, 6-29 
or complements thereof or fragments of either that can act as markers and other nucleic acid 

25 molecules of the present invention that can act as markers. 

Genetic markers of the invention include "dominant" or "codominant" markers. 
"Codominant markers" reveal the presence of two or more alleles (two per diploid individual) at 




a locus. "Dominant markers" reveal the presence of only a single allele per locus. The presence 
of the dominant marker phenotype (e.g., a band of DNA) is an indication that one allele is in 
either the homozygous or heterozygous condition. The absence of the dominant marker 
phenotype (e.g., absence of a DNA band) is merely evidence that "some other" undefined allele 
5 is present. In the case of populations where individuals are predominantly homozygous and loci 
are predominately dimorphic, dominant and codominant markers can be equally valuable. As 
populations become more heterozygous and multi-allelic, codominant markers often become 
more informative of the genotype than dominant markers. Marker molecules can be, for 
example, capable of detecting polymorphisms such as single nucleotide polymorphisms (SNPs). 

10 The genomes of animals and plants naturally undergo spontaneous mutation in the course 

of their continuing evolution (Gusella, Ann. Rev. Biochem. 55:831-854 (1986)). A 
"polymorphism" is a variation or difference in the sequence of the gene or its flanking regions 
that arises in some of the members of a species. The variant sequence and the "original" 
sequence co-exist in the species' population. In some instances, such co-existence is in stable or 

15 quasi-stable equilibrium. 

A polymorphism is thus said to be "allelic," in that, due to the existence of the 
polymorphism, some members of a species may have the original sequence (i.e., the original 
"allele") whereas other members may have the variant sequence (i.e., the variant "allele"). In the 
simplest case, only one variant sequence may exist and the polymorphism is thus said to be di- 

20 allelic. In other cases, the species 1 population may contain multiple alleles and the 

polymorphism is termed tri-allelic, etc. A single gene may have multiple different unrelated 
polymorphisms. For example, it may have a di-allelic polymorphism at one site and a multi- 
allelic polymorphism at another site. 

The variation that defines the polymorphism may range from a single nucleotide variation 

25 to the insertion or deletion of extended regions within a gene. In some cases, the DNA sequence 
variations are in regions of the genome that are characterized by short tandem repeats (STRs) that 
include tandem di- or tri-nucleotide repeated motifs of nucleotides. Polymorphisms 




characterized by such tandem repeats are referred to as "variable number tandem repeat" 
("VNTR") polymorphisms. VNTRs have been used in identity analysis (Weber, U.S. Patent 
5,075,217; Armour et ai, FEBS Lett. 307:1 13-1 15 (1992); Jones et ai, Eur. J. Haematol 
39: 144-147 (1987); Horn et ai, PCT Patent Application WO91/14003; Jeffreys, European Patent 
5 Application 370,719; Jeffreys, U.S. Patent 5,175,082; Jeffreys et al.,Amer. J. Hum. Genet. 

39: 1 1-24 (1986); Jeffreys et ai, Nature 316:16-19 (1985); Gray et ai, Proc. R. Acad Soc. Loud. 
2^:241-253 (1991); Moore et a/., Genomics 70:654-660 (1991); Jeffreys et ai, Anim. Genet. 
75:1-15 (1987); Hillel et ai, Anim. Genet. 20:145-155 (1989); Hillel et aU Genet. 724:783-789 
(1990)). 

10 The detection of polymorphic sites in a sample of DNA may be facilitated through the use 

of nucleic acid amplification methods. Such methods specifically increase the concentration of 
polynucleotides that span the polymorphic site, or include that site and sequences located either 
distal or proximal to it. Such amplified molecules can be readily detected by gel electrophoresis 
or other means. 

15 In an alternative embodiment, such polymorphisms can be detected through the use of a 

marker nucleic acid molecule that is physically linked to such polymorphism(s). For this 
purpose, marker nucleic acid molecules comprising a nucleotide sequence of a polynucleotide 
located within 1 mb of the polymorphism(s) and more preferably within lOOkb of the 
polymorphism(s) and most preferably within lOkb of the polymorphism(s) can be employed. 

20 The identification of a polymorphism can be determined in a variety of ways. By 

correlating the presence or absence of it in a plant with the presence or absence of a phenotype, it 
is possible to predict the phenotype of that plant. If a polymorphism creates or destroys a 
restriction endonuclease cleavage site, or if it results in the loss or insertion of DNA (e.g., a 
VNTR polymorphism), it will alter the size or profile of the DNA fragments that are generated by 

25 digestion with that restriction endonuclease. As such, organisms that possess a variant sequence 
can be distinguished from those having the original sequence by restriction fragment analysis. 
Polymorphisms that can be identified in this manner are termed "restriction fragment length 
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polymorphisms" ("RFLPs") (Glassberg, UK Patent Application 2135774; Skolnick et aL, 
Cytogen. Cell Genet. 52:58-67 ( 1982); Botstein et aL, Ann. J. Hum. Genet. 32:314-331 (1980); 
Fischers/., (PCT Application WO90/13668; Uhlen, PCT Application WO90/1 1369). 

Polymorphisms can also be identified by Single Strand Conformation Polymorphism 
5 (SSCP) analysis (Elles, Methods in Molecular Medicine: Molecular Diagnosis of Genetic 
Diseases, Humana Press (1996)); Orita et aL, Genomics 5/874-879 (1989)). A number of 
protocols have been described for SSCP including, but not limited to, Lee et aL, Anal. Biochem. 
205:289-293 (1992); Suzuki et aL, Anal. Biochem. 792:82-84 (1991); Lo et aL, Nucleic Acids 
Research 20:1005-1009 (1992); Sorter et aL, Genomics 75:441-443 (1992). It is understood that 

10 one or more of the nucleic acids of the invention, may be utilized as markers or probes to detect 
polymorphisms by SSCP analysis. 

Polymorphisms may also be found using a DNA fingerprinting technique called amplified 
fragment length polymorphism (AFLP), which is based on the selective PCR amplification of 
restriction fragments from a total digest of genomic DNA to profile that DNA (Vos et aL, 

15 Nucleic Acids Res. 25:4407-4414 (1995)). This method allows for the specific co-amplification 
of high numbers of restriction fragments, which can be visualized by PCR without knowledge of 
the nucleic acid sequence. It is understood that one or more of the nucleic acids of the invention 
may be utilized as markers or probes to detect polymorphisms by AFLP analysis or for 
fingerprinting RNA. 

20 Polymorphisms may also be found using random amplified polymorphic DNA (RAPD) 

(Williams et aL, Nucl. Acids Res. 18:653 1-6535 (1990)) and cleaveable amplified polymorphic 
sequences (CAPS) (Lyamichev et aL, Science 260:778-783 (1993)). It is understood that one or 
more of the nucleic acid molecules of the invention, may be utilized as markers or probes to 
detect polymorphisms by RAPD or CAPS analysis. 

25 Single Nucleotide Polymorphisms (SNPs) generally occur at greater frequency than other 

polymorphic markers and are spaced with a greater uniformity throughout a genome than other 
reported forms of polymorphism. The greater frequency and uniformity of SNPs means that 




there is greater probability that such a polymorphism will be found near or in a genetic locus of 
interest than would be the case for other polymorphisms. SNPs are located in protein-coding 
regions and noncoding regions of a genome. Some of these SNPs may result in defective or 
variant protein expression (e.g., as a result of mutations or defective splicing). Analysis 
5 (genotyping) of characterized SNPs can require only a plus/minus assay rather than a lengthy 
measurement, permitting easier automation. 

SNPs can be characterized using any of a variety of methods. Such methods include the 
direct or indirect sequencing of the site, the use of restriction enzymes (Botstein et al, Am. J. 
Hum. Genet. 32:3 14-331 (1980), the entirety of which is herein incorporated reference; 

10 Konieczny and Ausubel, Plant J. 4:403-410 (1993), the entirety of which is herein incorporated 
by reference), enzymatic and chemical mismatch assays (Myers et al, Nature 373:495-498 
(1985), the entirety of which is herein incorporated by reference), allele-specific PCR (Newton et 
al, Nucl. Acids Res. 77:2503-2516 (1989), the entirety of which is herein incorporated by 
reference; Wu et al, Proc. Natl. Acad. Sci. USA 36:2757-2760 (1989), the entirety of which is 

15 herein incorporated by reference), ligase chain reaction (Barany, Proc. Natl Acad. Sci. USA 
38:189-193 (1991), the entirety of which is herein incorporated by reference), single-strand 
conformation polymorphism analysis (Labrune et al,Am. J. Hum. Genet. 48: 1115-1120 (1991), 
the entirety of which is herein incorporated by reference), single base primer extension 
(Kuppuswamy et al, Proc. Natl Acad. Sci. USA 88: 1 143-1 147 (1991), Goelet US 6,004,744; 

20 Goelet 5,888,819; all of which are herein incorporated by reference in their entirety ), solid-phase 
ELISA-based oligonucleotide ligation assays (Nikiforov et al, Nucl Acids Res. 22:4167-4175 
(1994), dideoxy fingerprinting (Sarkar et al, Genomics 7.?:441-443 (1992), the entirety of which 
is herein incorporated by reference), oligonucleotide fluorescence-quenching assays (Livak et al, 
PCR Methods Appl 4:351-362 (1995a), the entirety of which is herein incorporated by 

25 reference), S'-nuclease allele-specific hybridization TaqMan™ assay (Livak et al, Nature Genet. 
9:341-342 (1995), the entirety of which is herein incorporated by reference), template-directed 
dye-terminator incorporation (TDI) assay (Chen and Kwok. Nucl Acids Res. 25:347-353 (1997), 




the entirety of which is herein incorporated by reference), allele-specific molecular beacon assay 
(Tyagi et a/., Nature Biotech. 16: 49-53 (1998), the entirety of which is herein incorporated by 
reference), PinPoint assay ( Haff and Smirnov, Genome Res. 7: 378-388 (1997), the entirety of 
which is herein incorporated by reference), dCAPS analysis (Neff et al., Plant J. 74:387-392 
5 (1998), the entirety of which is herein incorporated by reference), pyrosequencing (Ronaghi et al 
Analytical Biochemistry 267:65-71 (1999); Ronaghi et al PCT application WO 98/13523; Nyren 
et al PCT application WO 98/28440, all of which are herein incorporated by reference in their 
entirety; http//www. pyrosequencing. com), using mass spectrometry, e.g. the Masscode ™ system 
(Howbert et al WO 99/05319; Howber et al WO 97/27331, all of which are herein incorporated 

10 by reference in their entirety; http//www. rapigene.com; Becker et al PCT application WO 
98/26095; Becker et al PCT application; WO 98/12355; Becker et al PCT application WO 
97/33000; Monforte et al US 5,965,363, all of which are herein incorporated by reference in 
their entirety), invasive cleavage of oligonucleotide probes (Lyamichev et al Nature 
Biotechnology 77:292-296, herein incorporated by reference in its entirety; http//www. twt.com), 

15 and using high density oligonucleotide arrays (Hacia et al Nature Genetics 22:164-167; herein 
incorporated by reference in its entirety; http//www. affymetrix.com). 

Polymorphisms may also be detected using allele-specific oligonucleotides (ASO), 
which, can be for example, used in combination with hybridization based technology including 
southern, northern, and dot blot hybridizations, reverse dot blot hybridizations and hybridizations 

20 performed on microarray and related technology. 

The stringency of hybridization for polymorphism detection is highly dependent upon a 
variety of factors, including length of the allele-specific oligonucleotide, sequence composition, 
degree of complementarity (i.e. presence or absence of base mismatches), concentration of salts 
and other factors such as formamide, and temperature. These factors are important both during 

25 the hybridization itself and during subsequent washes performed to remove target polynucleotide 
that is not specifically hybridized. In practice, the conditions of the final, most stringent wash are 
most critical. In addition, the amount of target polynucleotide that is able to hybridize to the 




allele-specific oligonucleotide is also governed by such factors as the concentration of both the 
ASO and the target polynucleotide, the presence and concentration of factors that act to "tie up" 
water molecules, so as to effectively concentrate the reagents (e.g., PEG, dextran, dextran sulfate, 
etc.), whether the nucleic acids are immobilized or in solution, and the duration of hybridization 
5 and washing steps. 

Hybridizations are preferably performed below the melting temperature (T m ) of the ASO. 
The closer the hybridization and/or washing step is to the T m , the higher the stringency. T m for 
an oligonucleotide may be approximated, for example, according to the following formula: T m = 
81.5 + 16.6 x (loglO[Na+J) + 0.41 x (%G+C) - 675/n; where [Na+] is the molar salt 
10 concentration of Na+ or any other suitable cation and n = number of bases in the oligonucleotide. 
Other formulas for approximating T m are available and are known to those of ordinary skill in the 
art. 

Stringency is preferably adjusted so as to allow a given ASO to differentially hybridize to 
a target polynucleotide of the correct allele and a target polynucleotide of the incorrect allele. 

15 Preferably, there will be at least a two-fold differential between the signal produced by the ASO 
hybridizing to a target polynucleotide of the correct allele and the level of the signal produced by 
the ASO cross-hybridizing to a target polynucleotide of the incorrect allele (e.g., an ASO specific 
for a mutant allele cross-hybridizing to a wild-type allele). In more preferred embodiments of the 
present invention, there is at least a five-fold signal differential. In highly preferred embodiments 

20 of the present invention, there is at least an order of magnitude signal differential between the 
ASO hybridizing to a target polynucleotide of the correct allele and the level of the signal 
produced by the ASO cross-hybridizing to a target polynucleotide of the incorrect allele. 

While certain methods for detecting polymorphisms are described herein, other detection 
methodologies may be utilized. For example, additional methodologies are known and set forth, 

25 in Birren et al. % Genome Analysis, 4: 135-186, A Laboratory Manual. Mapping Genomes. Cold 
Spring Harbor Laboratory Press, Cold Spring Harbor, NY (1999); Maliga et ai. Methods in Plant 
Molecular Biology, A Laboratory Course Manual Cold Spring Harbor Laboratory Press, Cold 




Spring Harbor, NY (1995); Paterson, Biotechnology Intelligence Unit: Genome Mapping in 
Plants, R.G. Landes Co., Georgetown, TX, and Academic Press, San Diego, CA (1996); The 
Maize Handbook, Freeling and Walbot, eds., Springer- Verlag, New York, NY (1994); Methods 
in Molecular Medicine: Molecular Diagnosis of Genetic Diseases, Elles, ed., Humana Press, 
5 Totowa, NJ (1996); Clark, ed., Plant Molecular Biology: A Laboratory Manual, Clark, ed., 
Springer- Verlag, Berlin, Germany (1997), all of which are herein incoiporated by reference in 
their entirety. 

Requirements for marker-assisted selection in a plant breeding program are: (1) the 
marker(s) should co-segregate or be closely linked with the desired trait; (2) an efficient means of 

10 screening large populations for the molecular marker(s) should be available; and (3) the 
screening technique should have high reproducibility across laboratories and preferably be 
economical to use and be user-friendly. 

The genetic linkage of marker molecules can be established by a gene mapping model 
such as, without limitation, the flanking marker model reported by Lander and Botstein, Genetics 

15 727:185-199 (1989) and the interval mapping, based on maximum likelihood methods described 
by Lander and Botstein, Genetics 727:185-199 (1989) and implemented in the software package 
MAPMAKER/QTL (Lincoln and Lander, Mapping Genes Controlling Quantitative Traits Using 
MAPMAKER/QTU Whitehead Institute for Biomedical Research, Massachusetts, (1990). 
Additional software includes Qgene, Version 2.23 (1996), Department of Plant Breeding and 

20 Biometry, 266 Emerson Hall, Cornell University, Ithaca, NY). Use of Qgene software is a 
particularly preferred approach. 

A maximum likelihood estimate (MLE) for the presence of a marker is calculated, 
together with an MLE assuming no QTL effect, to avoid false positives. A logm of an odds ratio 
(LOD) is then calculated as: LOD = logm (MLE for the presence of a QTL7MLE given no linked 

25 QTL). 

The LOD score essentially indicates how much more likely the data are to have arisen 
assuming the presence of a QTL than :n its absence. The LOD threshold value for avoiding a 




false positive with a given confidence, say 957c, depends on the number of markers and the 
length of the genome. Graphs indicating LOD thresholds are set forth in Lander and Botstein, 
Genetics 727:185-199 (1989) and further described by Ariis and Moreno-Gonzalez , Plant 
Breeding, Hayward et al. t (eds.) Chapman & Hall, London, pp. 314-331 (1993). 
5 In a preferred embodiment of the present invention the nucleic acid marker exhibits a 

LOD score of greater than 2.0, more preferably 2.5, even more preferably greater than 3.0 or 4.0 
with the trait or phenotype of interest. In a preferred embodiment, the trait of interest is altered, 
preferably increased phytosterol levels or compositions. 

Additional models can be used. Many modifications and alternative approaches to 

10 interval mapping have been reported, including the use non-parametric methods (Kruglyak and 
Lander, Genetics 139: 1421-1428 (1995)). Multiple regression methods or models can be also be 
used, in which the trait is regressed on a large number of markers (Jansen, Biometrics in Plant 
Breeding, van Oijen and Jansen (eds.), Proceedings of the Ninth Meeting of the Eucarpia Section 
Biometrics in Plant Breeding, The Netherlands, pp. 1 16-124 (1994); Weber and Wricke, 

15 Advances in Plant Breeding, Blackwell, Berlin, 16 (1994)). Procedures combining interval 

mapping with regression analysis, whereby the phenotype is regressed onto a single putative QTL 
at a given marker interval and at the same time onto a number of markers that serve as 'cofactors,' 
have been reported by Jansen and Stam, Genetics 136: 1447-1455 (1994), and Zeng, Genetics 
7^6:1457-1468 (1994) . Generally, the use of cofactors reduces the bias and sampling error of 

20 the estimated QTL positions (Utz and Melchinger, Biometrics in Plant Breeding, van Oijen and 
Jansen (eds.) Proceedings of the Ninth Meeting of the Eucarpia Section Biometrics in Plant 
Breeding, The Netherlands, pp. 195-204 (1994), thereby improving the precision and efficiency of 
QTL mapping (Zeng, Genetics 7^6:1457-1468 (1994), herein incorporated by reference in its 
entirety). These models can be extended to multi-environment experiments to analyze genotype- 

25 environment interactions (Jansen et a/., Theo. Appl Genet. 97:33-37 (1995), herein incorporated 
by reference in its entirety). 
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It is understood that one or more of the nucleic acid molecules of the invention may be 
used as molecular markers. It is also understood that one or more of the protein molecules of the 
invention may be used as molecular markers. 

In a preferred embodiment, the polymorphism is present and screened for in a mapping 
5 population, e.g. a collection of plants capable of being used with markers such as polymorphic 
markers to map genetic position of traits. The choice of appropriate mapping population often 
depends on the type of marker systems employed (Tanksley et aL, J. P. Gustafson and R. Appels 
(eds.). Plenum Press, New York, pp. 157-173 (1988), the entirety of which is herein incorporated 
by reference). Consideration must be given to the source of parents (adapted vs. exotic) used in 

10 the mapping population. Chromosome pairing and recombination rates can be severely disturbed 
(suppressed) in wide crosses (adapted x exotic) and generally yield greatly reduced linkage 
distances. Wide crosses will usually provide segregating populations with a relatively large 
number of polymorphisms when compared to progeny in a narrow cross (adapted x adapted). 

An F 2 population is the first generation of selfing (self-pollinating) after the hybrid seed is 

15 produced. Usually a single Fi plant is selfed to generate a population segregating for all the 

genes in Mendelian (1:2:1) pattern. Maximum genetic information is obtained from a completely 
classified F 2 population using a codominant marker system (Mather, Measurement of Linkage in 
Heredity: Methuen and Co., (1938), the entirety of which is herein incorporated by reference). In 
the case of dominant markers, progeny tests (e.g., F 3 , BCF 2 ) are required to identify the 

20 heterozygotes, in order to classify the population. However, this procedure is often prohibitive 
because of the cost and time involved in progeny testing. Progeny testing of F 2 individuals is 
often used in map construction where phenotypes do not consistently reflect genotype (e.g. 
disease resistance) or where trait expression is controlled by a QTL. Segregation data from 
progeny test populations e.g. F3 or BCF:) can be used in map construction. Marker-assisted 

25 selection can then be applied to cross progeny based on marker-trait map associations (F:, F ? ), 
where linkage groups have not been completely disassociated by recombination events (i.e.. 
maximum disequilibrium). 

-60- 




Recombinant inbred lines (RIL) (genetically related lines; usually >F 5 , developed from 
continuously selfing ¥ 2 lines towards homozygosity) can be used as a mapping population. 
Information obtained from dominant markers can be maximized by using RIL because all loci are 
homozygous or nearly so. Under conditions of tight linkage (i.e., about <l07c recombination), 
5 dominant and co-dominant markers evaluated in RIL populations provide more information per 
individual than either marker type in backcross populations (Reiter. Proc. Natl. Acad. Sci. 
(U.S.A.) 89:1477-1481 (1992), the entirety of which is herein incorporated by reference). 
However, as the distance between markers becomes larger (i.e., loci become more independent), 
the information in RIL populations decreases dramatically when compared to codominant 
10 markers. 

Backcross populations (e.g., generated from a cross between a successful variety 
(recurrent parent) and another variety (donor parent) carrying a trait not present in the former) 
can be utilized as a mapping population. A series of backcrosses to the recurrent parent can be 
made to recover most of its desirable traits. Thus a population is created consisting of individuals 

15 nearly like the recurrent parent but each individual carries varying amounts or mosaic of genomic 
regions from the donor parent. Backcross populations can be useful for mapping dominant 
markers if all loci in the recurrent parent are homozygous and the donor and recurrent parent 
have contrasting polymorphic marker alleles (Reiter et al., Proc. Natl. Acad. Sci. (U.S.A.) 
89:1477-1481 (1992), the entirety of which is herein incorporated by reference). Information 

20 obtained from backcross populations using either codominant or dominant markers is less than 
that obtained from F: populations because one, rather than two, recombinant gamete is sampled 
per plant. Backcross populations, however, are more informative (at low marker saturation) 
when compared to RILs as the distance between linked loci increases in RIL populations (i.e. 
about A57c recombination). Increased recombination can be beneficial for resolution of tight 

25 linkages, but may be undesirable in the construction of maps with low marker saturation. 

Near-isogenic lines (NIL) (created by many backcrosses to produce a collection of 
individuals that is nearly identical in genetic composition except for the trait or genomic region 

-61 - 




under interrogation) can be used as a mapping population. In mapping with NILs, only a portion 

of the polymorphic loci is expected to map to a selected region. 

Bulk segregant analysis (BSA) is a method developed for the rapid identification of 

linkage between markers and traits of interest (Michelmore et al, Proc. Natl. Acad. Sci. U.S.A. 
5 ##:9828-9832 (1991), the entirety of which is herein incorporated by reference). In BSA, two 

bulked DNA samples are drawn from a segregating population originating from a single cross. 

These bulks contain individuals that are identical for a particular trait (resistant or susceptible to 

particular disease) or genomic region but arbitrary at unlinked regions (i.e. heterozygous). 

Regions unlinked to the target region will not differ between the bulked samples of many 
10 individuals in BSA. 

In an aspect of the present invention, one or more of the nucleic molecules of the present 

invention are used to determine the level (i.e., the concentration of mRNA in a sample, etc.) in a 

plant (preferably maize, canola, soybean, crambe, mustard, castor bean, peanut, sesame, 

cottonseed, linseed, safflower, oil palm, flax or sunflower) or pattern (i.e., the kinetics of 
15 expression, rate of decomposition, stability profile, etc.) of the expression of a protein encoded in 

part or whole by one or more of the nucleic acid molecule of the present invention (collectively, 

the "Expression Response" of a cell or tissue). 

As used herein, the Expression Response manifested by a cell or tissue is said to be 

"altered" if it differs from the Expression Response of cells or tissues of plants not exhibiting the 
20 phenotype. To determine whether a Expression Response is altered, the Expression Response 

manifested by the cell or tissue of the plant exhibiting the phenotype is compared with that of a 

similar cell or tissue sample of a plant not exhibiting the phenotype. As will be appreciated, it is 

not necessary to re-determine the Expression Response of the cell or tissue sample of plants not 

exhibiting the phenotype each time such a comparison is made; rather, the Expression Response 
25 of a particular plant may be compared with previously obtained values of normal plants. As used 

herein, the phenotype of the organism is an; of one or more characteristics of an organism (e.g. 

disease resistance, pest tolerance, environmental tolerance such as tolerance to abiotic stress, 
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male sterility, quality improvement or yield etc.). A change in genotype or phenotype may be 
transient or permanent. Also as used herein, a tissue sample is any sample that comprises more 
than one cell. In a preferred aspect, a tissue sample comprises cells that share a common 
characteristic (e.g. derived from root, seed, flower, leaf, stem or pollen etc.). 
5 In one aspect of the present invention, an evaluation can be conducted to determine 

whether a particular mRNA molecule is present. One or more of the nucleic acid molecules of 
the present invention are utilized to detect the presence or quantity of the mRNA species. Such 
molecules are then incubated with cell or tissue extracts of a plant under conditions sufficient to 
permit nucleic acid hybridization. The detection of double-stranded probe-mRNA hybrid 

10 molecules is indicative of the presence of the mRNA; the amount of such hybrid formed is 

proportional to the amount of mRNA. Thus, such probes may be used to ascertain the level and 
extent of the mRNA production in a plant's cells or tissues. Such nucleic acid hybridization may 
be conducted under quantitative conditions (thereby providing a numerical value of the amount 
of the mRNA present). Alternatively, the assay may be conducted as a qualitative assay that 

15 indicates either that the mRNA is present, or that its level exceeds a user set, predefined value. 

A number of methods can be used to compare the expression response between two or 
more samples of cells or tissue. These methods include hybridization assays, such as northerns, 
RNAse protection assays, and in situ hybridization. Alternatively, the methods include PCR-type 
assays. In a preferred method, the expression response is compared by hybridizing nucleic acids 

20 from the two or more samples to an array of nucleic acids. The array contains a plurality of 
suspected sequences known or suspected of being present in the cells or tissue of the samples. 

An advantage of in situ hybridization over more conventional techniques for the detection 
of nucleic acids is that it allows an investigator to determine the precise spatial population 
(Angerer et ai, Dev. Biol. 707:477-484 (1984); Angerer et aL Dew Biol. 112: 157-166 ( 1985); 

25 Dixon et aL. EMBO J. 70:1317-1324 (1991)). In situ hybridization may be used to measure the 
steady-state level of RNA accumulation (Hardin etal.J. Mol. Biol. 202:417-431 (1989)). A 
number of protocols have been devised for in situ hybridization, each with tissue preparation, 

-63 - 




hybridization and washing conditions (Meyerowitz, Plant MoL Biol. Rep. 5:242-250 (1987); Cox 
and Goldberg, In: Plant Molecular Biology: A Practical Approach, Shaw (ed.), pp. 1-35, IRL 
Press, Oxford (1988); Raikhel et al., In situ RNA hybridization in plant tissues, In: Plant 
Molecular Biology Manual, vol. B9T-32, Kluwer Academic Publisher, Dordrecht, Belgium 
5 (1989)). 

In situ hybridization also allows for the localization of proteins within a tissue or cell 
(Wilkinson, In Situ Hybridization, Oxford University Press, Oxford (1992); Langdale, In Situ 
Hybridization In: The Maize Handbook, Freeling and Walbot (eds.), pp. 165-179, Spnnger- 
Verlag, New York (1994)). It is understood that one or more of the molecules of the invention, 

10 preferably one or more of the nucleic acid molecules or fragments thereof of the invention or one 
or more of the antibodies of the invention may be utilized to detect the level or pattern of a 
protein or mRNA thereof by in situ hybridization. 

Fluorescent in situ hybridization allows the localization of a particular DNA sequence 
along a chromosome, which is useful, among other uses, for gene mapping, following 

15 chromosomes in hybnd lines, or detecting chromosomes with translocations, transversions or 
deletions. In situ hybridization has been used to identify chromosomes in several plant species 
{GntioretaL, Plant MoL Biol. 17: 101-109 (1991); Gustafson et ai, Proc. Natl. Acad. Sci. 
(U.S.A.) 87:1899-1902 (1990); Mukai and Gill, Genome 54:448-452 (1991); Schwarzacher and 
Heslop-Haimson, Genome 34:317-323 (1991); Wang et al, Jpn. J. Genet. 66:313-316 (1991); 

20 Parra and Windle, Nature Genetics 5:17-21 (1993)). It is understood that the nucleic acid 
molecules of the invention may be used as probes or markers to localize sequences along a 
chromosome. 

Another method to localize the expression of a molecule is tissue printing. Tissue 
printing provides a way to screen, at the same time on the same membrane many tissue sections 
25 from different plants or different developmental stages (Yomo and Taylor, Plant a 7 72:35-43 
(1973); Harris and Chrispeels, Plant Physiol. 56:292-299 (1975); Cassab and Varner, J. Cell. 
Biol. 705:2581-2588 (1987); Spruce et ai. Phxtochemistry 26:2901-2903 ( 1987); Barres et ai, 
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Neuron 5/527-544 ( 1990); Reid and Pont-Lezica, Tissue Printing: Tools for the Study of 
Anatomy, Histochemistry and Gene Expression, Academic Press, New York, New York (1992); 
Reid e7 a/., Plant Physiol. 93: 160- 165 (1990); Ye etai, Plant J. 7:175-183 (1991)). 

A microarray-based method for high-throughput monitoring of gene expression may be 
5 utilized to measure expression response. This 'chip'-based approach involves microarrays of 
nucleic acid molecules as gene-specific hybridization targets to quantitatively measure 
expression of the corresponding mRNA (Schena et al, Science 270:461-410 (1995), the entirety 
of which is herein incorporated by reference; http://cmgm.stanford.edu/pbrown/an-ay.html ; 
Shalon, Ph.D. Thesis, Stanford University (1996), the entirety of which is herein incorporated by 

10 reference). Hybndization to a microarray can be used to efficiently analyze the presence and/or 
amount of a number of nucleotide sequences simultaneously. 

Several microarray methods have been described. One method compares the sequences 
to be analyzed by hybridization to a set of oligonucleotides representing all possible 
subsequences (Bains and Smith, J. Theor. Biol 135:303-301 (1989), the entirety of which is 

15 herein incorporated by reference). A second method hybridizes the sample to an array of 

oligonucleotide or cDNA molecules. An array consisting of oligonucleotides complementary to 
subsequences of a target sequence can be used to determine the identity of a target sequence, 
measure its amount, and detect single nucleotide differences between the target and a reference 
sequence. Nucleic acid molecule microarrays may also be screened with protein molecules or 

20 fragments thereof to determine nucleic acid molecules that specifically bind protein molecules or 
fragments thereof. 

The microarray approach may be used with polypeptide targets (U.S. Patent No. 
5,445,934; U.S. Patent No: 5,143,854; U.S. Patent No. 5,079,600; U.S. Patent No. 4,923,901, all 
of which are herein incorporated by reference in their entirety). Essentially, polypeptides are 
25 synthesized on a substrate (microarray) and these polypeptides can be screened with either 
protein molecules or fragments thereof or nucleic acid molecules in order to screen for either 
protein molecules or fragments thereof or nucleic acid molecules that specifically bind the target 
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polypeptides. (Fodor et al, Science 257:767-773 (1991), the entirety of which is herein 
incorporated by reference). It is understood that one or more of the nucleic acid molecules or 
protein or fragments thereof of the invention may be utilized in a microarray-based method. 
In a preferred embodiment of the present invention microarrays may be prepared that 
5 comprise nucleic acid molecules where preferably at least 10%, preferably at least 25%, more 
preferably at least 50% and even more preferably at least 75%, 80%, 85%, 90% or 95% of the 
nucleic acid molecules located on that array are selected from the group of nucleic acid 
molecules that specifically hybridize to one or more nucleic acid molecule having a nucleic acid 
sequence selected from the group of SEQ ID NO: 1 through SEQ ID NO: 621 or complements 

10 thereof or fragments of either. 

In another preferred embodiment of the present invention microarrays may be prepared 
that comprise nucleic acid molecules where preferably at least 10%, preferably at least 25%, 
more preferably at least 50%' and even more preferably at least 75%, 80%, 85%, 90% or 95% of 
the nucleic acid molecules located on that array are selected from the group of nucleic acid 

15 molecules having a nucleic acid sequence selected from the group of SEQ ID NO: 1 through 
SEQ ID NO: 62 1 or complements thereof or fragments of either. 

In a preferred embodiment of the present invention microarrays may be prepared that 
comprise nucleic acid molecules where such nucleic acid molecules encode at least one, 
preferably at least two, more preferably at least three, even more preferably at least four, five or 

20 six proteins or fragments thereof selected from the group consisting of HES 1 , HMGCoA 

reductase, squalene synthase, cycloartenol synthase, SMTII and UPC2. In even more preferred 
embodiment of the present invention microarrays may be prepared that comprise nucleic acid 
molecules where such nucleic acid molecules encode at least one, preferably at least two, more 
preferably at least three, even more preferably at least four, five or six proteins or fragments 

25 thereof selected from the group consisting of a fungal, more preferably a yeast HES1, a plant, 
more preferably a maize, soybean or Arabidopsis HES1, a plant, more preferably a rubber or an 
Arabulopsis HMGCoA reductase, a plant, more preferably an Arabidopsis squalene synthase, a 
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plant, more preferably an Arabidopsis cycloartenol synthase, a plant, more preferably an 
Arabidopsis SMTII and a fungus , more preferably an yeast UPC2. 

Site directed mutagenesis may be utilized to modify nucleic acid sequences, particularly 
as it is a technique that allows one or more of the amino acids encoded by a nucleic acid 
5 molecule to be altered (e.g., a threonine to be replaced by a methionine). At least three basic 
methods for site directed mutagenesis can be employed. These are cassette mutagenesis (Wells 
et al, Gene 34:315-323 (1985), the entirety of which is herein incorporated by reference), primer 
extension (Gilliam et al., Gene 72:129-137 (1980), the entirety of which is herein incorporated 
by reference; Zoller and Smith, Methods Enzymol. 700:468-500 (1983), the entirety of which is 

10 herein incorporated by reference; Dalbadie-McFarland et al, Proc. Natl. Acad. ScL (U.S.A.) 
79:6409-6413 (1982), the entirety of which is herein incorporated by reference) and methods 
based upon PCR (Scharf et aL, Science 235:1076-1078 (1986), the entirety of which is herein 
incorporated by reference; Higuchi et aL, Nucleic Acids Res. 76:7351-7367 (1988), the entirety of 
which is herein incorporated by reference). Site directed mutagenesis approaches are also 

15 described in U.S. Patent 5,81 1,238, European Patent 0 385 962, the entirety of which is herein 
incorporated by reference; European Patent 0 359 472, the entirety of which is herein 
incorporated by reference; and PCT Patent Application WO 93/07278, the entirety of which is 
herein incorporated by reference. 

Site directed mutagenesis strategies have been applied to plants for both in vitro as well 

20 as /// vivo site directed mutagenesis (Lanz et aL, J. Biol. Chem. 266:9971-9976 (1991), the 
entirety of which is herein incorporated by reference; Kovgan and Zhdanov, Biotekhnologiya 
5:148-154, No. 207160n, Chemical Abstracts 110:225 (1989), the entirety of which is herein 
incorporated by reference; Ge et aL, Proc. Natl. Acad. Sci. (U.S.A.) §6:4037-4041 (1989), the 
entirety of which is herein incorporated by reference; Zhu et aL, J. Biol. Chem. 277: 18494-18498 

25 (1996), the entirety of which is herein incorporated by reference; Chu et aL, Biochemistry 
33:6150-6157 (1994), the entirety of which is herein incorporated by reference; Small et aL, 
EMBOJ. 7 7:1291-1296 (1992), the entirety of which is herein incorporated by reference; Cho et 
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al, Mol Biotechnoi <*?: 1 3- 1 6 (1997), the entirety of which is herein incorporated by reference; 
Kita et al, J. Biol. Chem. 277:26529-26535 (1996), the entirety of which is herein incorporated 
by reference, Jm et al, Mol Microbiol 7:555-562 (1993), the entirety of which is herein 
incorporated by reference; Hatfield and Vierstra, J. Biol Chem. 267: 14799-14803 (1992), the 
5 entirety of which is herein incorporated by reference; Zhao et al, Biochemistry ^7:5093-5099 
(1992), the entirety of which is herein incorporated by reference). 

Any of the nucleic acid molecules of the invention may either be modified by site directed 
mutagenesis or used as, for example, nucleic acid molecules that are used to target other nucleic 
acid molecules for modification. It is understood that mutants with more than one altered 
10 nucleotide can be constructed using techniques that practitioners are familiar with, such as 
isolating restriction fragments and ligating such fragments into an expression vector (see, for 
example, Sambrook et al, Molecular Cloning: A Laboratory Manual Cold Spring Harbor Press 
(1989)). 

Sequence-specific DNA-binding proteins play a role in the regulation of transcription. 

15 The isolation of recombinant cDNAs encoding these proteins facilitates the biochemical analysis 
of their structural and functional properties. Genes encoding such DNA-binding proteins have 
been isolated using classical genetics (Vollbrecht et al, Nature 350: 241-243 (1991), the entirety 
of which is herein incorporated by reference) and molecular biochemical approaches, including 
the screening of recombinant cDNA libraries with antibodies (Landschulz et al, Genes Dev. 

20 2:786-800 (1988), the entirety of which is herein incorporated by reference) or DNA probes 

(Bodner et al, Cell 55:505-518 (1988), the entirety of which is herein incorporated by reference). 
In addition, an in situ screening procedure has been used and has facilitated the isolation of 
sequence-specific DNA-binding proteins from various plant species (Gilmartin et al. Plant Cell 
^:839-849 (1992), the entirety of which is herein incorporated by reference; Schindler et al, 

25 EMBO J. 7 7:1261-1273 (1992), the entirety of which is herein incorporated by reference). An in 
situ screening protocol does not require the purification of the protein of interest (Vinson et al. 
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Genes Dev. 2:801-806 (1988), the entirety of which is herein incorporated by reference; Singh et 
al, Cell 52:415-423 (1988), the entirety of which is herein incorporated by reference). 

Two steps may be employed to characterize DNA-protein interactions. The first is to 
identify sequence fragments that interact with DNA-binding proteins, to titrate binding activity, 
5 to determine the specificity of binding and to determine whether a given DNA-binding activity 
can interact with related DNA sequences (Sambrook et aL, Molecular Cloning: A Laboratory 
Manual, 2 nd edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York 
(1989)). Electrophoretic mobility-shift assay is a widely used assay. The assay provides a rapid 
and sensitive method for detecting DNA-binding proteins based on the observation that the 

10 mobility of a DNA fragment through a nondenaturing, low-ionic strength polyacrylamide gel is 
retarded upon association with a DNA-binding protein (Fried and Crother, Nucleic Acids Res. 
9:6505-6525 (1981), the entirety of which is herein incorporated by reference). When one or 
more specific binding activities have been identified, the exact sequence of the DNA bound by 
the protein may be determined. 

15 Several procedures for characterizing protein/DNA-binding sites are used, including 

methylation and ethylation interference assays (Maxam and Gilbert, Methods EnzynwL 65:499- 
560 (1980), the entirety of which is herein incorporated by reference; Wissman and Hillen, 
Methods EnzynwL 208:365-319 (1991), the entirety of which is herein incorporated by 
reference), footprinting techniques employing DNase I (Galas and Schmitz, Nucleic Acids Res. 

20 5:3157-3170 (1978), the entirety of which is herein incorporated by reference), 1,10- 

phenanthroline-copper ion methods (Sigman et aL, Methods EnzynwL 205:414-433 (1991), the 
entirety of which is herein incorporated by reference) and hydroxyl radicals methods (Dixon et 
aL, Methods EnzynwL 205:414-433 (1991), the entirety of which is herein incorporated by 
reference). It is understood that one or more of the nucleic acid molecules of the invention may 

25 be utilized to identify a protein or fragment thereof that specifically binds to a nucleic acid 
molecule of the invention. It is also understood that one or more of the protein molecules or 
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fragments thereof of the invention may be utilized to identify a nucleic acid molecule that 
specifically binds to it. 

A two-hybrid system is based on the fact that proteins, such as transcription factors that 
interact (physically ) with one another carry out many cellular functions. Two-hybrid systems 
5 have been used to probe the function of new proteins (Chien et al., Proc. Natl Acaci Sci. 

(USA.) £8:9578-9582 (1991) the entirety of which is herein incorporated by reference; Durfee et 
al, Genes Dev. 7:555-569 (1993) the entirety of which is herein incorporated by reference; Choi 
et al, Cell 78:499-512 (1994), the entirety of which is herein incorporated by reference; Kranz et 
al, Genes Dew 8:313-327 (1994), the entirety of which is herein incorporated by reference). 

10 Interaction mating techniques have facilitated a number of two-hybrid studies of protein- 

protein interaction. Interaction mating has been used to examine interactions between small sets 
of tens of proteins (Finley and Brent, Proc. Natl Acaci Sci. (U.S.A.) 97:12098-12984 (1994), the 
entirety of which is herein incorporated by reference), larger sets of hundreds of proteins 
(Bendixen et al, Nucl. Acids Res. 22:1778-1779 (1994), the entirety of which is herein 

15 incorporated by reference) and to comprehensively map proteins encoded by a small genome 
(Bartel et al, Nature Genetics 12:12-11 (1996), the entirety of which is herein incorporated by 
reference). This technique utilizes proteins fused to the DNA-binding domain and proteins fused 
to the activation domain. They are expressed in two different haploid yeast strains of opposite 
mating type and the strains are mated to determine if the two proteins interact. Mating occurs 

20 when haploid yeast strains come into contact and result in the fusion of the two haploids into a 
diploid yeast strain. An interaction can be determined by the activation of a two-hybrid reporter 
gene in the diploid strain. 

The CLONTECH laboratories, Inc. provides the MATCHMAKER two-hybrid System kit 
(Cat. No. K 1605-1 ) in which the sequences encoding the two functional domains of the GAL4 

25 transcriptional activator, DNA binding domain and activation domain, are cloned into two 
different shuttle/expression vectors (pGBT9 and pGAD424) ( Bartel et al In Cellular 
Interactions in Development: A Practical Approach. D.A. Hartley, ed., Oxford University Press, 
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Oxford 153-179 (1993), the entirety of which is herein incorporated by reference). The gene 
code for the target protein is cloned into the pGBT9 to generate a hybrid of GAL4-DNA binding 
domain with a target protein and the gene(s) encode for potentially interacting protein(s) are 
cloned into the pGAD424 to create hybrid protein(s) of GAL4-activation domain with potentially 
5 interacting protein or with a collection of random proteins in a fusion library. The both plasmids 
carrying hybrid proteins are cotransformed into one yeast strain. Both hybrid proteins are 
targeted to the yeast nucleus by nuclear localization signal. If the target protein and the 
potentially interacting protein interact with each other, the GAL4 DNA binding domain and the 
GAL4 activation domain are brought to proximity and proper function of the transcriptional 

10 activator unit will be reconstituted resulting in transcription of reporter gene (lacZ or HIS3). An 
advantage of this technique is that it reduces the number of yeast transformations needed to test 
individual interactions. It is understood that the protein-protein interactions of protein or 
fragments thereof of the invention may be investigated using the two-hybrid system and that any 
of the nucleic acid molecules of the invention that encode such proteins or fragments thereof may 

15 be used to transform yeast in the two-hybrid system. 

(f) Fungal Constructs and Fungal Transformants 

The invention also relates to a fungal recombinant vector comprising exogenous genetic 
material. The invention also relates to a fungal cell comprising a fungal recombinant vector. 
The invention also relates to methods for obtaining a recombinant fungal host cell comprising 

20 introducing into a fungal host cell exogenous genetic material. 

Exogenous genetic material may be transferred into a fungal cell. In a preferred 
embodiment the exogenous genetic material includes a nucleic acid molecule of the present 
invention, preferably a nucleic acid molecule having a sequence selected from the group 
consisting of SEQ ID NO: 1 through SEQ ID NO: 621 or complements thereof or fragments of 

25 either. Another preferred class of exogenous genetic material are nucleic acid molecules that 
encode a protein having an amino acid selected from the group consisting of SEQ ID NO: 622 
through SEQ ID NO: 626 or fragments thereof. 




The fungal recombinant vector may be any vector which can be conveniently subjected to 
recombinant DNA procedures. The choice of a vector will typically depend on the compatibility 
of the vector with the fungal host cell into w hich the vector is to be introduced. The vector may 
be a linear or a closed circular plasmid. The vector system may be a single vector or plasmid or 
5 two or more vectors or plasmids which together contain the total DNA to be introduced into the 
genome of the fungal host. 

The fungal vector may be an autonomously replicating vector, i.e., a vector which exists 
as an extrachromosomal entity, the replication of which is independent of chromosomal 
replication, e.g., a plasmid, an extrachromosomal element, a minichromosome, or an artificial 

10 chromosome. The vector may contain any means for assuring self-replication. Alternatively, the 
vector may be one which, when introduced into the fungal cell, is integrated into the genome and 
replicated together with the chromosome(s) into which it has been integrated. This integration 
may be the result of homologous or non-homologous recombination. 

Integration of a vector or nucleic acid into the genome by homologous recombination, 

15 regardless of the host being considered, relies on the nucleic acid sequence of the vector. 

Typically, the vector contains nucleic acid sequences for directing integration by homologous 
recombination into the genome of the host. These nucleic acid sequences enable the vector to be 
integrated into the host cell genome at a precise location or locations in one or more 
chromosomes. To increase the likelihood of integration at a precise location, there should be 

20 preferably two nucleic acid sequences that individually contain a sufficient number of nucleic 
acids, preferably 400bp to 1500bp, more preferably 800bp to lOOObp, which are highly 
homologous with the corresponding host cell target sequence. This enhances the probability of 
homologous recombination. These nucleic acid sequences may be any sequence that is 
homologous with a host cell target sequence and, furthermore, may or may not encode proteins. 

25 For autonomous replication, the vector may further comprise an origin of replication 

enabling the vector to replicate autonomous! / in the host cell in question. Examples of origin of 
replications for use in a yeast host cell are the 2 micron origin of replication and the combination 




of CEN3 and ARS 1. Any origin of replication may be used which is compatible with the fungal 
host cell of choice. 

The fungal vectors of the invention preferably contain one or more selectable markers 
which permit easy selection of transformed cells. A selectable marker is a gene the product of 
5 which provides, for example biocide or viral resistance, resistance to heavy metals, prototrophy 
to auxotrophs and the like. The selectable marker may be selected from the group including, but 
not limited to, amdS (acetamidase), argB (ornithine carbamoyltransferase), bar (phosphinothricin 
acetyltransferase), hygB (hygromycin phosphotransferase), niaD (nitrate reductase), pyrG 
(orotidine-5'-phosphate decarboxylase) and sC (sulfate adenyltransferase) and trpC (anthranilate 

10 synthase). Preferred for use in an Aspergillus cell are the amdS and pyrG markers of Aspergillus 
nidulans or Aspergillus oryzae and the bar marker of Streptomyces hygroscopicus. Furthermore, 
selection may be accomplished by co-transformation, e.g., as described in WO 91/17243, the 
entirety of which is herein incorporated by reference. A nucleic acid sequence of the invention 
may be operably linked to a suitable promoter sequence. The promoter sequence is a nucleic acid 

15 sequence which is recognized by the fungal host cell for expression of the nucleic acid sequence. 
The promoter sequence contains transcription and translation control sequences which mediate 
the expression of the protein or fragment thereof. 

A promoter may be any nucleic acid sequence which shows transcriptional activity in the 
fungal host cell of choice and may be obtained from genes encoding polypeptides either 

20 homologous or heterologous to the host cell. Examples of suitable promoters for directing the 
transcription of a nucleic acid construct of the invention in a filamentous fungal host are 
promoters obtained from the genes encoding Aspergillus oryzae TAKA amylase, Rhizomucor 
miehei aspartic proteinase, Aspergillus niger neutral alpha-amylase, Aspergillus niger acid stable 
alpha-amylase, Aspergillus niger or Aspergillus awamori glucoamylase (glaA), Rhizomucor 

25 miehei lipase, Aspergillus oryzae alkaline protease, Aspergillus oryzae triose phosphate 
isomerase, Aspergillus nidulans acetamidase and hybrids thereof. In a yeast host, a useful 
promoter is the Saccharomyces cerevisiae enolase (eno- 1 ) promoter. Particularly preferred 
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promoters are the TAKA amylase, NA2-tpi (a hybrid of the promoters from the genes encoding 
Aspergillus niger neutral alpha -amylase and Aspergillus oryzae triose phosphate isomerase), 
glaA, Saccharomyces cerevisiae GAL1 (galactokinase) and Saccharomyces cerevisiae GPD 
(glyceraldehyde-3-phosphate dehydrogenase) promoters. 
5 A protein or fragment thereof encoding nucleic acid molecule of the invention may also 

be operably linked to a terminator sequence at its 3' terminus. The terminator sequence may be 
native to the nucleic acid sequence encoding the protein or fragment thereof or may be obtained 
from foreign sources. Any terminator which is functional in the fungal host cell of choice may 
be used in the invention, but particularly preferred terminators are obtained from the genes 

10 encoding Aspergillus oryzae TAKA amylase, Aspergillus niger glucoamylase, Aspergillus 

nidulans anthranilate synthase, Aspergillus niger alpha-glucosidase, Saccharomyces cerevisiae 
cytochrome-c oxidase (CYC1) and Saccharomyces cerevisiae enolase. 

A protein or fragment thereof encoding nucleic acid molecule of the invention may also 
be operably linked to a suitable leader sequence. A leader sequence is a nontranslated region of a 

15 mRNA which is important for translation by the fungal host. The leader sequence is operably 
linked to the 5' terminus of the nucleic acid sequence encoding the protein or fragment thereof. 
The leader sequence may be native to the nucleic acid sequence encoding the protein or fragment 
thereof or may be obtained from foreign sources. Any leader sequence which is functional in the 
fungal host cell of choice may be used in the invention, but particularly preferred leaders are 

20 obtained from the genes encoding Aspergillus oryzae TAKA amylase and Aspergillus oryzae 
triose phosphate isomerase. 

A polyadenylation sequence may also be operably linked to the 3' terminus of the nucleic 
acid sequence of the invention. The polyadenylation sequence is a sequence which when 
transcribed is recognized by the fungal host to add polyadenosine residues to transcribed mRNA. 

25 The polyadenylation sequence may be native to the nucleic acid sequence encoding the protein or 
fragment thereof or may be obtained from foreign sources. Any polyadenylation sequence which 
is functional in the fungal host of choice may be used in the invention, but particularly preferred 
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pol yadenylation sequences are obtained from the genes encoding Aspergillus oryzae TAKA 
amylase, Aspergillus niger glucoamylase, Aspergillus nidulans anthranilate synthase, Aspergillus 
niger alpha-glucosidase and Saccharomyces cerevisiae cytochrome-c oxidase (CYC1). 

To avoid the necessity of disrupting the cell to obtain the protein or fragment thereof and 
5 to minimize the amount of possible degradation of the expressed protein or fragment thereof 
within the cell, it is preferred that expression of the protein or fragment thereof gives rise to a 
product secreted outside the cell. To this end, a protein or fragment thereof of the invention may 
be linked to a signal peptide linked to the amino terminus of the protein or fragment thereof. A 
signal peptide is an amino acid sequence which permits the secretion of the protein or fragment 

10 thereof from the fungal host into the culture medium. The signal peptide may be native to the 

protein or fragment thereof of the invention or may be obtained from foreign sources. The 5' end 
of the coding sequence of the nucleic acid sequence of the invention may inherently contain a 
signal peptide coding region naturally linked in translation reading frame with the segment of the 
coding region which encodes the secreted protein or fragment thereof. Alternatively, the 5' end 

15 of the coding sequence may contain a signal peptide coding region which is foreign to that 
portion of the coding sequence which encodes the secreted protein or fragment thereof. The 
foreign signal peptide may be required where the coding sequence does not normally contain a 
signal peptide coding region. Alternatively, the foreign signal peptide may simply replace the 
natural signal peptide to obtain enhanced secretion of the desired protein or fragment thereof. 

20 The foreign signal peptide coding region may be obtained from a glucoamylase or an amylase 
gene from an Aspergillus species, a lipase or proteinase gene from Rhizomucor miehei, the gene 
for the alpha-factor from Saccharomyces cerevisiae, or the calf preprochymosin gene. An 
effective signal peptide for fungal host cells is the Aspergillus oryzae TAKA amylase signal, 
Aspergillus niger neutral amylase signal, the Rhizomucor miehei aspartic proteinase signal, the 

25 Humicola lanuginosus cellulase signal, or the Rhizomucor miehei lipase signal. However, any 
signal peptide capable of permitting secretion of the protein or fragment thereof in a fungal host 
of choice may be used in the invention. 




A protein or fragment thereof encoding nucleic acid molecule of the invention may also 
be linked to a propeptide coding region. A propeptide is an amino acid sequence found at the 
amino terminus of aproprotein or proenzyme. Cleavage of the propeptide from the proprotein 
yields a mature biochemically active protein. The resulting polypeptide is known as a 
5 propolypeptide or proenzyme (or a zymogen in some cases). Propolypeptides are generally 

inactive and can be converted to mature active polypeptides by catalytic or autocatalytic cleavage 
of the propeptide from the propolypeptide or proenzyme. The propeptide coding region may be 
native to the protein or fragment thereof or may be obtained from foreign sources. The foreign 
propeptide coding region may be obtained from the Saccharomyces cerevisiae alpha-factor gene 

10 or Myceliophthora thennophila laccase gene (WO 95/33836, the entirety of which is herein 
incorporated by reference). 

The procedures used to ligate the elements described above to construct the recombinant 
expression vector of the invention are well known to one skilled in the art (see, for example, 
Sambrook et al, Molecular Cloning, A Laboratory Manual 2nd ed., Cold Spring Harbor, N.Y., 

15 (1989)). 

The invention also relates to recombinant fungal host cells produced by the methods of 
the invention which are advantageously used with the recombinant vector of the invention. The 
cell is preferably transformed with a vector comprising a nucleic acid sequence of the invention 
followed by integration of the vector into the host chromosome. The choice of fungal host cells 
20 will to a large extent depend upon the gene encoding the protein or fragment thereof and its 
source. The fungal host cell may, for example, be a yeast cell or a filamentous fungal cell. 

"Yeast" as used herein includes Ascosporogenous yeast (Endomycetales), 
Basidiosporogenous yeast and yeast belonging to the Fungi Imperfecti (Blastomycetes). The 
Ascosporogenous yeasts are divided into the families Spennophthoraceae and 
25 Saccharomycetaceae. The latter is comprised of four subfamilies, Schizosaccharomycoideae (for 
example, genus Schiz.osaccharomyces), Nadsonioideae, Lipomycoideae and Saccharomycoideac 
(for example, genera Pichia, Kluyveromyces and Saccharomyces). The Basidiosporogenous 
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yeasts include the genera Leucosporidim, Rhodosporidium, Sporidiobolus, Filobasidium and 
Filobasidiella. Yeast belonging to the Fungi Imperfecti are divided into two families, 
Sporobolomxcetaceae (for example, genera Sorobolomyces and Bullera) and Cryptococcaceae 
(for example, genus Candida). Since the classification of yeast may change in the future, for the 
5 purposes of this invention, yeast shall be defined as described in Biology and Activities of Yeast 
(Skinner et al, Soc\ App. Bacterial. Symposium Series No. 9, (1980), the entirety of which is 
herein incorporated by reference). The biology of yeast and manipulation of yeast genetics are 
well known in the art (see, for example, Biochemistry and Genetics of Yeast, Bacil et al. (ed.), 
2nd edition, 1987; The Yeasts, Rose and Harrison (eds.), 2nd ed., (1987); and The Molecular 

10 Biology of the Yeast Saccharomyces, Strathern et aL (eds.), (1981), all of which are herein 
incorporated by reference in their entirety). 

"Fungi" as used herein includes the phyla Ascomycota, Basidiomycota, Chytridiomycota 
and Zygomycota (as defined by Hawksworth et al, In: Ainsworth and Bisby's Dictionary of The 
Fungi, 8 th edition, 1995, CAB International, University Press, Cambridge, UK; the entirety of 

15 which is herein incorporated by reference) as well as the Oomycota (as cited in Hawksworth et 
al, In: Ainsworth and Bisby's Dictionary of The Fungi, 8 lh edition, 1995, CAB International, 
University Press, Cambridge, UK) and all mitosporic fungi (Hawksworth et al, In: Ainsworth 
and Bisby's Dictionary of The Fungi, 8 th edition, 1995, CAB International, University Press, 
Cambridge, UK). Representative groups of Ascomycota include, for example, Neurospora, 

20 Eupenicillium (= Penicillium), Emericella (= Aspergillus), Eurotiun (= Aspergillus) and the true 
yeasts listed above. Examples of Basidiomycota include mushrooms, rusts and smuts. 
Representative groups of Chytridiomycota include, for example, Allomyces, Blastocladiella, 
Coelomomyces and aquatic fungi. Representative groups of Oomycota include, for example, 
Saprolegniomycetous aquatic fungi (water molds) such as Achlya. Examples of mitosporic fungi 

25 include Aspergillus, Penicilliun, Candida and Altemaria. Representative groups of Zygomycota 
include, for example, Rhizopus and Mucor. 
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"Filamentous fungi" include all filamentous forms of the subdivision Eumycota and 
Oomycota (as defined by Hawksworth el al., In: Ainsworth and Bisby's Dictionary of The Fungi, 
8 th edition, 1995, CAB International, University Press, Cambridge, UK). The filamentous fungi 
are characterized by a vegetative mycelium composed of chitin, cellulose, glucan, chitosan, 
5 mannan and other complex polysaccharides. Vegetative growth is by hyphal elongation and 
carbon catabolism is obligately aerobic. In contrast, vegetative growth by yeasts such as 
Saccharomyces cerevisiae is by budding of a unicellular thallus and carbon catabolism may be 
fermentative. 

In one embodiment, the fungal host cell is a yeast cell. In a preferred embodiment, the 
10 yeast host cell is a cell of the species of Candida, Kluyveromyces, Saccharomyces, 

Schizosaccharomyces, Pichia and Yarrowia. In a preferred embodiment, the yeast host cell is a 
Saccharomyces cerevisiae cell, a Saccharomyces carlsbergensis, Saccharomyces diastaticus cell, 
a Saccharomyces douglasii cell, a Saccharomyces kluyveri cell, a Saccharomyces norhensis cell, 
or a Saccharomyces oviformis cell. In another preferred embodiment, the yeast host cell is a 
15 Kluyveromyces lactis cell. In another preferred embodiment, the yeast host cell is a Yarrowia 
Hpolytica cell. 

In another embodiment, the fungal host cell is a filamentous fungal cell. In a preferred 
embodiment, the filamentous fungal host cell is a cell of the species of, but not limited to, 
Acremonium, Aspergillus, Fusarium, Humicola, Myceliophthora, Mucor, Neurospora, 
20 Penicillium, Thielavia, Tolypocladium and Trichoderma. 

The recombinant fungal host cells of the invention may further comprise one or more 
sequences which encode one or more factors that are advantageous in the expression of the 
protein or fragment thereof, for example, an activator (e.g., a trans-acting factor), a chaperone 
and a processing protease. The nucleic acids encoding one or more of these factors are 
25 preferably not operably linked to the nucleic acid encoding the protein or fragment thereof. An 
activator is a protein which activates transcription of a nucleic acid sequence encoding a 
polypeptide (Kudla et aL. EM BO 9: 1 355- 1364( 1990); Jarai and Buxton, Current Genetics 
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26:2238-244(1994); Verdier, Yeast 6:271-297(1990), all of which are herein incorporated by 
reference in their entirety). The nucleic acid sequence encoding an activator may be obtained 
from the genes encoding Saccharomyces cerevisiae heme activator protein 1 (hapl), 
Saccharomyces cerevisiae galactose metabolizing protein 4 (gal4) and Aspergillus nidulans 
5 ammonia regulation protein (are A). For further examples, see Verdier, Yeast 6:21 1 -297 ( 1 990); 
MacKenzie et al, Journal of Gen. Microbiol 139:2295-2307 (1993), both of which are herein 
incorporated by reference in their entirety). A chaperone is a protein which assists another 
protein in folding properly (Haiti et al, TIBS 79:20-25 (1994); Bergeron et al, TIBS 19:124-128 
(1994); Demolder et al, J. Biotechnology 32:179-189 (1994); Craig, Science 260:1902- 

10 1903(1993); Gething and Sambrook, Nature 355:33-45 (1992); Puig and Gilbert, J Biol Chem. 
269:1164-1111 (1994); Wang andTsou, FASEB Journal 7:1515-1 1 157 (1993); Robinson et al, 
Bio/Technology 7:381-384 (1994), all of which are herein incorporated by reference in their 
entirety). The nucleic acid sequence encoding a chaperone may be obtained from the genes 
encoding Aspergillus oryzae protein disulphide isomerase, Saccharomyces cerevisiae calnexin, 

15 Saccharomyces cerevisiae BiP/GRP78 and Saccharomyces cerevisiae Hsp70. For further 
examples, see Gething and Sambrook, Nature 355:33-45 (1992); Haiti et al, TIBS 79:20-25 
(1994). A processing protease is a protease that cleaves a propeptide to generate a mature 
biochemically active polypeptide (Enderlin and Ogrydziak, Yeast 10:61-19 (1994); Fuller et al, 
Proc. Natl Acad Sci. (LISA.) 86:1434-1438 (1989); Julius et al, Cell 37: 1075-1089 (1984); 

20 Julius et al, Cell 32:839-852 (1983), all of which are incorporated by reference in their entirety). 
The nucleic acid sequence encoding a processing protease may be obtained from the genes 
encoding Aspergillus niger Ke\2, Saccharomyces cerevisiae dipeptidylaminopeptidase, 
Saccharomyces cerevisiae Kex2 and Yarrou ia lipolytica dibasic processing endoprotease (xpr6). 
Any factor that is functional in the fungal host cell of choice may be used in the invention. 

25 Fungal cells may be transformed by a process involving protoplast formation, 

transformation of the protoplasts and regeneration of the cell wall in a manner known per se. 
Suitable procedures for transformation of Aspergillus host cells are described in EP 238 023 and 
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Yelton et al, Proc. Natl Acad ScL (U.S.A.) 81: 1470-1474 (1984), both of which are herein 
incorporated by reference in their entirety. A suitable method of transforming Fusarium species 
is described by Malardiere7 «/., Gene 7^:147-156 (1989), the entirety of which is herein 
incorporated by reference. Yeast may be transformed using the procedures described by Becker 
5 and Guarente, In: Abelson and Simon, (eds.), Guide to Yeast Genetics and Molecular Biology, 
Methods Enzynwl. Volume 194, pp. 182-187, Academic Press, Inc., New York; Ito et al, J. 
Bacteriology 153:163 (1983); Hinnen et ai, Proc. Natl. Acad ScL (U.S.A.) 75: 1920 (1978), all of 
which are herein incorporated by reference in their entirety. 

The invention also relates to methods of producing the protein or fragment thereof 

10 comprising culturing the recombinant fungal host cells under conditions conducive for 

expression of the protein or fragment thereof. The fungal cells of the invention are cultivated in 
a nutrient medium suitable for production of the protein or fragment thereof using methods 
known in the art. For example, the cell may be cultivated by shake flask cultivation, small-scale 
or large-scale fermentation (including continuous, batch, fed-batch, or solid state fermentations) 

15 in laboratory or industrial fermentors performed in a suitable medium and under conditions 

allowing the protein or fragment thereof to be expressed and/or isolated. The cultivation takes 
place in a suitable nutrient medium comprising carbon and nitrogen sources and inorganic salts, 
using procedures known in the art (see, e.g., Bennett and LaSure (eds.), More Gene 
Manipulations in Fungi, Academic Press, CA, (1991), the entirety of which is herein 

20 incorporated by reference). Suitable media are available from commercial suppliers or may be 
prepared according to published compositions (e.g.. in catalogues of the American Type Culture 
Collection, Manassas, VA). If the protein or fragment thereof is secreted into the nutrient 
medium, a protein or fragment thereof can be recovered directly from the medium. If the protein 
or fragment thereof is not secreted, it is recovered from cell lysates, 

25 The expressed protein or fragment thereof may be detected using methods known in the 

art that are specific for the particular protein or fragment. These detection methods may include 
the use of specific antibodies, formation of an enzyme product, or disappearance of an enzyme 
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substrate. For example, if the protein or fragment thereof has enzymatic activity, an enzyme 
assay may be used. Alternatively, if polyclonal or monoclonal antibodies specific to the protein 
or fragment thereof are available, immunoassays may be employed using the antibodies to the 
protein or fragment thereof. The techniques of enzyme assay and immunoassay are well known 
5 to those skilled in the art. 

The resulting protein or fragment thereof may be recovered by methods known in the arts. 
For example, the protein or fragment thereof may be recovered from the nutrient medium by 
conventional procedures including, but not limited to, centrifugation, filtration, extraction, spray- 
drying, evaporation, or precipitation. The recovered protein or fragment thereof may then be 

10 further purified by a variety of chromatographic procedures, e.g., ion exchange chromatography, 
gel filtration chromatography, affinity chromatography, or the like. 

(g) Mammalian Constructs and Transformed Mammalian Cells 
The invention also relates to methods for obtaining a recombinant mammalian host cell, 
comprising introducing into a mammalian host cell exogenous genetic material. The invention 

15 also relates to a mammalian cell comprising a mammalian recombinant vector. The invention 

also relates to methods for obtaining a recombinant mammalian host cell, comprising introducing 
into a mammalian cell exogenous genetic material. In a preferred embodiment the exogenous 
genetic material includes a nucleic acid molecule of the present invention, preferably a nucleic 
acid molecule having a sequence selected from the group consisting of SEQ ID NO: 1 through 

20 SEQ ID NO: 621 or complements thereof or fragments of either. Another preferred class of 
exogenous genetic material are nucleic acid molecules that encode a protein having an amino 
acid selected from the group consisting of SEQ ID NO: 622 through SEQ ID NO: 626 or 
fragments thereof. 

Mammalian cell lines available as hosts for expression are known in the art and include 
25 many immortalized cell lines available from the American Type Culture Collection (ATCC, 
Manassas, VA), such as HcLa cells, Chinese hamster ovary (CHO) cells, baby hamster kidney 
(BHK) cells and a number of other cell lines. 




Suitable promoters for mammalian cells are also known in the art and include viral 
promoters, such as those from Simian Virus 40 (SV40) (Fiers et al, Nature 273: 1 13 (1978)), 
Rous sarcoma virus (RSV), adenovirus (ADV), cytomegalovirus (CMV), and bovine papilloma 
virus (BPV), as well as mammalian cell-derived promoters. An exemplary, non-limiting, list 
5 includes: a hematopoietic stem cell-specific promoter, such as the CD34 promoter (Burn et al, 
U.S. Patent No. 5,556,954); the glucose-6-phosphotase promoter (Yoshiuchi et al,J. Clin. 
Endocrin. Metab. Si: 1016-1019 (1998)); interleukin-1 alpha promoter (Mori and Prager, Leuk. 
Lymphoma 26:421-433 (1997)); CMV promoter (Tong et al, Anticancer Res. 78:719-725 
(1998), Norman et al, Vaccine 75:801-803 (1997)); RSV promoter (Elshami et al, Cancer Gene 

10 Ther. 4:213-221 (1997); Baldwin et al, Gene Ther. 4: 1 142-1 149 (1997)); SV40 promoter 
(Harms and Splitter, Hum. Gene Ther. 6:1291-1297 (1995)); CD1 lc integrin gene promoter 
(Corbi and Lopez-Rodriguez, Leuk. Lymphoma 25:415-425 (1997)), GM-CSF promoter 
(Shannon et al, Crit. Rev. Immunol. 77:301-323 (1997)); interleukin-5R alpha promoter (Sun et 
al, Curr. Top. Microbiol Immunol 277:173-187 (1996)); interleukin-2 promoter (Serfing et al, 

15 Biochim. Biophys. Acta 1263: 181-200 (1995); O'Neill et al, Transplant Proc. 2*2862-2866 
(1991)); c-fos promoter (Janknecht, Immunobiology 79*137-142 (1995), Janknecht et al, 
Carcinogenesis 76:443-450 (1995), Takai etal, Princess Takamatsu Symp. 22:197-204 (1991)); 
h-ras promoter (Rachal et al, EXS 64:330-342 (1993)); and DMD gene promoter (Ray et al, 
Adv. Exp. Med. Biol. 280: 107-1 1 1 (1990). All of the above documents are incorporated by 

20 reference in their entirety and can be relied on to make or use aspects of this invention, especially 
in designing and constructing appropriate vector and host expression systems. 

Vectors used in mammalian cell expression systems may also include additional 
functional sequences. For example, terminator sequences, poly-A addition sequences, and 
internal ribosome entry site (IRES) sequences. Enhancer sequences, which increase expression, 

25 may also be included and sequences that promote amplification of the gene may also be desirable 
(for example, methotrexate resistance genes). One of skill in the ail is familiar with numerous 



-82- 




examples of these additional functional sequences, as well as other functional sequences, that 
may optionally be included in an expression vector. 

Vectors suitable for replication in mammalian cells may include viral replicons, or 
sequences which insure integration of the appropriate sequences encoding HCV epitopes into the 
5 host genome. For example, another vector used to express foreign DNA is vaccinia virus. In this 
case, for example, a nucleic acid molecule encoding a protein or fragment thereof is inserted into 
the vaccinia genome. Techniques for the insertion of foreign DNA into the vaccinia virus 
genome are known in the art and may utilize, for example, homologous recombination. Such 
heterologous DNA is generally inserted into a gene which is non-essential to the virus, for 

10 example, the thymidine kinase gene (tk), which also provides a selectable marker. Plasmid 

vectors that greatly facilitate the construction of recombinant viruses have been described (see, 
for example, Mackett et aU J Virol 49:851 (1984); Chakrabarti et ai, Mol Cell Biol 5:3403 
(1985); Moss, In: Gene Transfer Vectors For Mammalian Cells (Miller and Calos, eds., Cold 
Spring Harbor Laboratory, N.Y., p. 10, (1987); all of which are herein incorporated by reference 

15 in their entirety). Expression of the HCV polypeptide then occurs in cells or animals which are 
infected with the live recombinant vaccinia virus. 

The sequence to be integrated into the mammalian sequence may be introduced into the 
primary host by any convenient means, which includes calcium precipitated DNA, spheroplast 
fusion, transformation, electroporation, biolistics, lipofection, microinjection, or other convenient 

20 means. Where an amplifiable gene is being employed, the amplifiable gene may serve as the 
selection marker for selecting hosts into which the amplifiable gene has been introduced. 
Alternatively, one may include with the amplifiable gene another marker, such as a drug 
resistance marker, e.g., neomycin resistance (G418 in mammalian cells), hygromycin in 
resistance etc., or an auxotrophy marker (HIS3, TRP1, LEU2, URA3, ADE2, LYS2, etc.) for use 

25 in yeast cells. 

Depending upon the nature of the modification and associated targeting construct, various 
techniques may be employed for identifying targeted integration. Conveniently, the DNA may be 
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digested with one or more restriction enzymes and the fragments probed with an appropriate 
DNA fragment which will identify the properly sized restriction fragment associated with 
integration. 

One may use different promoter sequences, enhancer sequences, or other sequence which 
5 will allow for enhanced levels of expression in the expression host. Thus, one may combine an 
enhancer from one source, a promoter region from another source, a 5'- noncoding region 
upstream from the initiation methionine from the same or different source as the other sequences 
and the like. One may provide for an intron in the non-coding region with appropriate splice 
sites or for an alternative 3'- untranslated sequence or polyadenylation site. Depending upon the 

10 particular purpose of the modification, any of these sequences may be introduced, as desired. 

Where selection is intended, the sequence to be integrated will have with it a marker 
gene, which allows for selection. The marker gene may conveniently be downstream from the 
target gene and may include resistance to a cytotoxic agent, e.g., antibiotics, heavy metals, or the 
like, resistance or susceptibility to HAT, gancyclovir, etc., complementation to an auxotrophic 

15 host, particularly by using an auxotrophic yeast as the host for the subject manipulations, or the 
like. The marker gene may also be on a separate DNA molecule, particularly with primary 
mammalian cells. Alternatively, one may screen the various transformants, due to the high 
efficiency of recombination in yeast, by using hybridization analysis, PCR, sequencing, or the 
like. 

20 For homologous recombination, constructs can be prepared where the amplifiable gene 

will be flanked, normally on both sides with DNA homologous with the DNA of the target 
region. Depending upon the nature of the integrating DNA and the purpose of the integration, 
the homologous DNA will generally be within lOOkb, usually 50kb, preferably about 25kb, of the 
transcribed region of the target gene, more preferably within 2kb of the target gene. Where 

25 modeling of the gene is intended, homology will usually be present proximal to the site of the 
mutation. The homologous DNA may include the 5'-upstream region outside of the 
transcriptional regulatory region or comprising any enhancer sequences, transcriptional initiation 
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sequences, adjacent sequences, or the like. The homologous region may include a portion of the 
coding region, where the coding region may be comprised only of an open reading frame or 
combination of exons and introns. The homologous region may comprise all or a portion of an 
intron, where all or a portion of one or more exons may also be present. Alternatively, the 
5 homologous region may comprise the 3'-region, so as to comprise all or a portion of the 

transcriptional termination region, or the region 3' of this region. The homologous regions may 
extend over all or a portion of the target gene or be outside the target gene comprising all or a 
portion of the transcriptional regulatory regions and/or the structural gene. 

The integrating constructs may be prepared in accordance with conventional ways, where 

10 sequences may be synthesized, isolated from natural sources, manipulated, cloned, Hgated, 
subjected to in vitro mutagenesis, primer repair, or the like. At various stages, the joined 
sequences may be cloned and analyzed by restriction analysis, sequencing, or the like. Usually 
during the preparation of a construct where various fragments are joined, the fragments, 
intermediate constructs and constructs will be carried on a cloning vector comprising a 

15 replication system functional in a prokaryotic host, e.g., E. coli and a marker for selection, e.g., 
biocide resistance, complementation to an auxotrophic host, etc. Other functional sequences may 
also be present, such as polylinkers, for ease of introduction and excision of the construct or 
portions thereof, or the like. A large number of cloning vectors are available such as pBR322, 
the pUC series, etc. These constructs may then be used for integration into the primary 

20 mammalian host. 

In the case of the primary mammalian host, a replicating vector may be used. Usually, 
such vector will have a viral replication system, such as SV40, bovine papilloma virus, 
adenovirus, or the like. The linear DNA sequence vector may also have a selectable marker for 
identifying transfected cells. Selectable markers include the neo gene, allowing for selection 

25 with G418, the herpes tk gene for selection with HAT medium, the gpt gene with mycophenolic 
acid, complementation of an auxotrophic host, etc. 
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The vector may or may not be capable of stable maintenance in the host. Where the 
vector is capable of stable maintenance, the cells will be screened for homologous integration of 
the vector into the genome of the host, where various techniques for curing the cells may be 
employed. Where the vector is not capable of stable maintenance, for example, where a 
5 temperature sensitive replication system is employed, one may change the temperature from the 
permissive temperature to the non-permissive temperature, so that the cells may be cured of the 
vector. In this case, only those cells having integration of the construct comprising the 
amplifiable gene and, when present, the selectable marker, will be able to survive selection. 
Where a selectable marker is present, one may select for the presence of the targeting 

10 construct by means of the selectable marker. Where the selectable marker is not present, one 
may select for the presence of the construct by the amplifiable gene. For the neo gene or the 
herpes tk gene, one could employ a medium for growth of the transformants of about 0.1-1 
mg/ml of G418 or may use HAT medium, respectively. Where DHFR is the amplifiable gene, 
the selective medium may include from about 0.01-0.5 |iM of methotrexate or be deficient in 

15 glycine-hypoxanthine-thymidine and have dialysed serum (GHT media). 

The DNA can be introduced into the expression host by a variety of techniques that 
include calcium phosphate/DNA co-precipitates, microinjection of DNA into the nucleus, 
electroporation, yeast protoplast fusion with intact cells, transfection, polycations, e.g., 
polybrene, polyornithine, etc., or the like. The DNA may be single or double stranded DNA, 

20 linear or circular. The various techniques for transforming mammalian cells are well known (see 
Keown et «/., Methods EnzymoL (1989); Keown et a/.. Methods EnzymoL 185:521-531 (1990); 
Mansour et ai, Nature .U6:348-352, (1988); all of which are herein incorporated by reference in 
their entirety). 

(h) Insect Constructs and Transformed Insect Cells 

25 The invention also relates to an insect recombinant vectors comprising exogenous genetic 

material. The invention also relates to an insect cell comprising an insect recombinant vector. 
The invention also relates to methods for obtaining a recombinant insect host cell, comprising 
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introducing into an insect cell exogenous genetic material. In a preferred embodiment the 
exogenous genetic material includes a nucleic acid molecule of the present invention, preferably 
a nucleic acid molecule having a sequence selected from the group consisting of SEQ ID NO: 1 
through SEQ ID NO: 62 1 or complements thereof or fragments of either. Another preferred class 
5 of exogenous genetic material are nucleic acid molecules that encode a protein having an amino 
acid selected from the group consisting of SEQ ID NO: 622 through SEQ ID NO: 626 or 
fragments thereof. 

The insect recombinant vector may be any vector which can be conveniently subjected to 
recombinant DNA procedures and can bring about the expression of the nucleic acid sequence. 

10 The choice of a vector will typically depend on the compatibility of the vector with the insect 

host cell into which the vector is to be introduced. The vector may be a linear or a closed circular 
plasmid. The vector system may be a single vector or plasmid or two or more vectors or 
plasmids which together contain the total DNA to be introduced into the genome of the insect 
host. In addition, the insect vector may be an expression vector. Nucleic acid molecules can be 

15 suitably inserted into a replication vector for expression in the insect cell under a suitable 
promoter for insect cells. Many vectors are available for this purpose and selection of the 
appropriate vector will depend mainly on the size of the nucleic acid molecule to be inserted into 
the vector and the particular host cell to be transformed with the vector. Each vector contains 
various components depending on its function (amplification of DNA or expression of DNA) and 

20 the particular host cell w ith which it is compatible. The vector components for insect cell 

transformation generally include, but are not limited to, one or more of the following: a signal 
sequence, origin of replication, one or more marker genes and an inducible promoter. 

The insect vector may be an autonomously replicating vector, i.e., a vector which exists 
as an extrachromosomal entity, the replication of which is independent of chromosomal 

25 replication, e.g., a plasmid, an extrachromosomal element, a minichromosome, or an artificial 

chromosome. The vector may contain any means for assuring self-replication. Alternatively, the 
vector may be one which, when introduced into the insect cell, is integrated into the genome and 
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replicated together with the chromosome(s) into which it has been integrated. For integration, 
the vector may rely on the nucleic acid sequence of the vector for stable integration of the vector 
into the genome by homologous or nonhomologous recombination. Alternatively, the vector 
may contain additional nucleic acid sequences for directing integration by homologous 
5 recombination into the genome of the insect host. The additional nucleic acid sequences enable 
the vector to be integrated into the host cell genome at a precise location(s) in the 
chromosome(s). To increase the likelihood of integration at a precise location, there should be 
preferably two nucleic acid sequences which individually contain a sufficient number of nucleic 
acids, preferably 400bp to 1500bp, more preferably 800bp to lOOObp, which are highly 
10 homologous with the corresponding target sequence to enhance the probability of homologous 
recombination. These nucleic acid sequences may be any sequence that is homologous with a 
target sequence in the genome of the insect host cell and, furthermore, may be non-encoding or 
encoding sequences. 

Baculovirus expression vectors (BEVs) have become important tools for the expression 
15 of foreign genes, both for basic research and for the production of proteins with direct clinical 
applications in human and veterinary medicine (Doerfler, Curr. Top. Microbiol. Immunol 
131:51-68 (1968); Luckow and Summers, Bio/Technology 6:47-55 (1988a); Miller, Annual 
Review of Microbiol. 42: 177-199 (1988); Summers, Curr. Comm. Molecular Biology, Cold 
Spring Harbor Press, Cold Spring Harbor, N.Y. (1988); all of which are herein incorporated by 
20 reference in their entirety). BEVs are recombinant insect viruses in which the coding sequence 
for a chosen foreign gene has been inserted behind a baculovirus promoter in place of the viral 
gene, e.g., polyhedrin (Smith and Summers, U.S. Pat. No., 4,745,051, the entirety of which is 
incorporated herein by reference). 

The use of baculovirus vectors relies upon the host cells being derived from Lepidopteran 
25 insects such as Spodoptera frugiperda or Trichoplusia ni. The preferred Spodoptera frugiperda 
cell line is the cell line Sf9. The Spodoptera frugiperda Sf9 cell line was obtained from 
American Type Culture Collection (Manassas, VA.) and is assigned accession number ATCC 
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CRL 171 1 (Summers and Smith, A Manual of Methods for Baculovirus Vectors and Insect Cell 
Culture Procedures, Texas Ag. Exper. Station Bulletin No. 1555 (1988), the entirety of which is 
herein incorporated by reference ). Other insect cell systems, such as the silkworm B. mori may 
also be used. 

5 The proteins expressed by the BEVs are, therefore, synthesized, modified and transported 

in host cells derived from Lepidopteran insects. Most of the genes that have been inserted and 
produced in the baculovirus expression vector system have been derived from vertebrate species. 
Other baculovirus genes in addition to the polyhedrin promoter may be employed to advantage in 
a baculovirus expression system. These include immediate-early (alpha), delayed-early (P), late 

10 ( y), or very late (delta), according to the phase of the viral infection during which they are 
expressed. The expression of these genes occurs sequentially, probably as the result of a 
"cascade" mechanism of transcriptional regulation. (Guarino and Summers, J. Virol 57:563-571 
(1986); Guarino and Summers, J. Virol (57:2091-2099 (1987); Guarino and Summers, Virol 
162 A44A5 1 (1988); all of which are herein incorporated by reference in their entirety). 

15 Insect recombinant vectors are useful as intermediates for the infection or transformation 

of insect cell systems. For example, an insect recombinant vector containing a nucleic acid 
molecule encoding a baculovirus transcriptional promoter followed downstream by an insect 
signal DNA sequence is capable of directing the secretion of the desired biologically active 
protein from the insect cell. The vector may utilize a baculovirus transcriptional promoter region 

20 derived from any of the over 500 baculoviruses generally infecting insects, such as for example 
the Orders Lepidoptera, Diptera, Orthoptera, Coleoptera and Hymenoptera, including for 
example but not limited to the viral DNAs of Autographa califomica MNPV, Bombyx mori NPV, 
Trichoplusia ni MNPV, Rachiplusia on MNPV or Galleria mellonella MNPV, wherein said 
baculovirus transcriptional promoter is a baculovirus immediate-early gene IE1 or IEN promoter; 

25 an immediate-early gene in combination with a baculovirus delayed-early gene promoter region 
selected from the group consisting of 39K and a Hindlll-k fragment delayed-early gene; or a 
baculovirus late gene promoter. The immediate-early or delayed-early promoters can be 
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enhanced with transcriptional enhancer elements. The insect signal DNA sequence may code for 
a signal peptide of a Lepidopteran adipokinetic hormone precursor or a signal peptide of the 
Mandnca sexta adipokinetic hormone precursor (Summers, U.S. Patent No. 5,155,037; the 
entirety of which is herein incorporated by reference). Other insect signal DNA sequences 
5 include a signal peptide of the Orthoptera Schistocerca gregaria locust adipokinetic hormone 
precursor and the Drosophila melanogaster cuticle genes CP1, CP2, CP3 or CP4 or for an insect 
signal peptide having substantially a similar chemical composition and function (Summers, U.S. 
Patent No. 5,155,037). 

Insect cells are distinctly different from animal cells. Insects have a unique life cycle and 

10 have distinct cellular properties such as the lack of intracellular plasminogen activators in which 
are present in vertebrate cells. Another difference is the high expression levels of protein 
products ranging from 1 to greater than 500 mg/liter and the ease at which cDNA can be cloned 
into cells (Frasier, In Vitro Cell. Dew Biol 25:225 (1989); Summers and Smith, In: A Manual of 
Methods for Baculovirus Vectors and Insect Cell Culture Procedures, Texas Ag. Exper. Station 

15 Bulletin No. 1555 (1988), both of which are incorporated by reference in their entirety). 

Recombinant protein expression in insect cells is achieved by viral infection or stable 
transformation. For viral infection, the desired gene is cloned into baculovirus at the site of the 
wild-type polyhedron gene (Webb and Summers, Technique 2:173 (1990); Bishop and Posse, 
Adv. Gene TechnoL 1:55 (1990); both of which are incorporated by reference in their entirety). 

20 The polyhedron gene is a component of a protein coat in occlusions which encapsulate virus 
particles. Deletion or insertion in the polyhedron gene results the failure to form occlusion 
bodies. Occlusion negative viruses are morphologically different from occlusion positive viruses 
and enable one skilled in the art to identify and purify recombinant viruses. 

The vectors of invention preferably contain one or more selectable markers, which permit 

25 easy selection of transformed cells. A selectable marker is a gene the product of which provides, 
for example biocide or viral resistance, resistance to heavy metals, prototrophy to auxotrophs and 
the like. Selection may be accomplished by co-transformation, e.g., as described in WO 
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91/17243, a nucleic acid sequence of the invention may be operably linked to a suitable promoter 
sequence. The promoter sequence is a nucleic acid sequence, which is recognized by the insect 
host cell for expression of the nucleic acid sequence. The promoter sequence contains 
transcription and translation control sequences, which mediate the expression of the protein or 
5 fragment thereof. The promoter may be any nucleic acid sequence, which shows transcriptional 
activity in the insect host cell of choice and may be obtained from genes encoding polypeptides 
either homologous or heterologous to the host cell. 

For example, a nucleic acid molecule encoding a protein or fragment thereof may also be 
operably linked to a suitable leader sequence. A leader sequence is a nontranslated region of a 

10 mRNA, which is important for translation by the fungal host. The leader sequence is operably 
linked to the 5' terminus of the nucleic acid sequence encoding the protein or fragment thereof. 
The leader sequence may be native to the nucleic acid sequence encoding the protein or fragment 
thereof or may be obtained from foreign sources. Any leader sequence, which is functional in the 
insect host cell of choice may be used in the invention. 

15 A polyadenylation sequence may also be operably linked to the 3' terminus of the nucleic 

acid sequence of the invention. The polyadenylation sequence is a sequence which when 
transcribed is recognized by the insect host to add polyadenosine residues to transcribed mRNA. 
The polyadenylation sequence may be native to the nucleic acid sequence encoding the protein or 
fragment thereof or may be obtained from foreign sources. Any polyadenylation sequence, 

20 which is functional in the fungal host of choice, may be used in the invention. 

To avoid the necessity of disrupting the cell to obtain the protein or fragment thereof and 
to minimize the amount of possible degradation of the expressed polypeptide within the cell, it is 
preferred that expression of the polypeptide gene gives rise to a product secreted outside the cell. 
To this end, the protein or fragment thereof of the invention may be linked to a signal peptide 

25 linked to the amino terminus of the protein or fragment thereof. A signal peptide is an amino 

acid sequence which permits the secretion of the protein or fragment thereof from the insect host 
into the culture medium. The signal peptide may be native to the protein or fragment thereof of 
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the invention or may be obtained from foreign sources. The 5' end of the coding sequence of the 
nucleic acid sequence of the invention may inherently contain a signal peptide coding region 
naturally linked in translation reading frame with the segment of the coding region which 
encodes the secreted protein or fragment thereof. 
5 At present, a mode of achieving secretion of a foreign gene product in insect cells is by 

way of the foreign gene's native signal peptide. Because the foreign genes are usually from non- 
insect organisms, their signal sequences may be poorly recognized by insect cells and, hence, 
levels of expression may be suboptimal. However, the efficiency of expression of foreign gene 
products seems to depend primarily on the characteristics of the foreign protein. On average, 

10 nuclear localized or non-structural proteins are most highly expressed, secreted proteins are 
intermediate and integral membrane proteins are the least expressed. One factor generally 
affecting the efficiency of the production of foreign gene products in a heterologous host system 
is the presence of native signal sequences (also termed presequences, targeting signals, or leader 
sequences) associated with the foreign gene. The signal sequence is generally coded by a DNA 

15 sequence immediately following (5' to 3*) the translation start site of the desired foreign gene. 

The expression dependence on the type of signal sequence associated with a gene product 
can be represented by the following example. If a foreign gene is inserted at a site downstream 
from the translational start site of the baculovirus polyhednn gene so as to produce a fusion 
protein (containing the N-terminus of the polyhedrin structural gene), the fused gene is highly 

20 expressed. But less expression is achieved when a foreign gene is inserted in a baculovirus 
expression vector immediately following the transcriptional start site and totally replacing the 
polyhedrin structural gene. 

Insertions into the region -50 to -1 significantly alter (reduce) steady state transcription 
w hich, in turn, reduces translation of the foreign gene product. Use of the pVL941 vector 

25 optimizes transcription of foreign genes to the level of the polyhedrin gene transcription. Even 
though the transcription of a foreign gene may be optimal optimal translation may vary because 
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of several factors involving processing: signal peptide recognition, mRNA and nbosome binding, 
glycosylation, disulfide bond formation, sugar processing, oligomerization, for example. 

The properties of the insect signal peptide are expected to be more optimal for the 
efficiency of the translation process in insect cells than those from vertebrate proteins. This 
5 phenomenon can generally be explained by the fact that proteins secreted from cells are 

synthesized as precursor molecules containing hydrophobic N-terminal signal peptides. The 
signal peptides direct transport of the select protein to its target membrane and are then cleaved 
by a peptidase on the membrane, such as the endoplasmic reticulum, when the protein passes 
through it. 

10 Another exemplary insect signal sequence is the sequence encoding for Drosophila 

cuticle proteins such as CP1, CP2, CP3 or CP4 (Summers, U.S. Patent No. 5,278,050; the 
entirety of which is herein incorporated by reference). Most of a 9kb region of the Drosophila 
genome containing genes for the cuticle proteins has been sequenced. Four of the five cuticle 
genes contains a signal peptide coding sequence interrupted by a short intervening sequence 

15 (about 60 base pairs) at a conserved site. Conserved sequences occur in the 5' mRNA 

untranslated region, in the adjacent 35 base pairs of upstream flanking sequence and at -200 base 
pairs from the mRNA start position in each of the cuticle genes. 

Standard methods of insect cell culture, cotransfection and preparation of plasmids are set 
forth in Summers and Smith (Summers and Smith, A Manual of Methods for Baculovirus 

20 Vectors and Insect Cell Culture Procedures, Texas Agricultural Experiment Station Bulletin No. 
1555, Texas A&M University (1987)). Procedures for the cultivation of viruses and cells are 
described in Volkman and Summers, J. Virol 79:820-832 (1975) and Volkman et a/., J. Virol 
79:820-832 (1976); both of which are herein incorporated by reference in their entirety, 
(i) Bacterial Constructs and Transformed Bacterial Cells 

25 The invention also relates to a bacterial recombinant vector comprising exogenous 

genetic material. The invention also relates to a bacteria cell comprising a bacterial recombinant 
vector. The invention also relates to methods for obtaining a recombinant bacteria host cell, 
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comprising introducing into a bacterial host cell exogenous genetic material. In a preferred 
embodiment the exogenous genetic material includes a nucleic acid molecule of the present 
invention, preferably a nucleic acid molecule having a sequence selected from the group 
consisting of SEQ ED NO: 1 through SEQ ID NO: 621 or complements thereof or fragments of 
5 either. Another preferred class of exogenous genetic material are nucleic acid molecules that 
encode a protein having an amino acid selected from the group consisting of SEQ ID NO: 622 
through SEQ ED NO: 626 or fragments thereof. 

The bacterial recombinant vector may be any vector that can be conveniently subjected to 
recombinant DNA procedures. The choice of a vector will typically depend on the compatibility 

10 of the vector with the bacterial host cell into which the vector is to be introduced. The vector 
may be a linear or a closed circular plasmid. The vector system may be a single vector or 
plasmid or two or more vectors or plasmids, which together contain the total DNA to be 
introduced into the genome of the bacterial host. In addition, the bacterial vector may be an 
expression vector. Nucleic acid molecules encoding protein homologues or fragments thereof 

15 can, for example, be suitably inserted into a replicable vector for expression in the bacterium 

under the control of a suitable promoter for bacteria. Many vectors are available for this purpose 
and selection of the appropriate vector will depend mainly on the size of the nucleic acid to be 
inserted into the vector and the particular host cell to be transformed with the vector. Each 
vector contains various components depending on its function (amplification of DNA or 

20 expression of DNA) and the particular host cell with which it is compatible. The vector 

components for bacterial transformation generally include, but are not limited to, one or more of 
the following: a signal sequence, an origin of replication, one or more marker genes and an 
inducible promoter. 

In general, plasmid vectors containing replicon and control sequences that are derived 
25 from species compatible with the host cell are used in connection with bacterial hosts. The 
vector ordinarily carries a replication site, as well as marking sequences that are capable of 
providing phenotypic selection in transformed cells. For example, E. coli is typically 
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transformed using pBR322, a plasmid derived from an E. coli species (see, e.g.. Bolivar et al., 
Gene 2:95 (1977); the entirety of which is herein incorporated by reference). The plasmid 
pBR322 contains genes for ampicillin and tetracycline resistance and thus provides easy means 
for identifying transformed cells. The pBR322 plasmid, or other microbial plasmid or phage, 
5 also generally contains, or is modified to contain, promoters that can be used by the microbial 
organism for expression of the selectable marker genes. 

Nucleic acid molecules encoding protein or fragments thereof may be expressed not only 
directly, but also as a fusion with another polypeptide, preferably a signal sequence or other 
polypeptide having a specific cleavage site at the N-terminus of the mature polypeptide. In 

10 general, the signal sequence may be a component of the vector, or it may be a part of the 

polypeptide DNA that is inserted into the vector. The heterologous signal sequence selected 
should be one that is recognized and processed (i.e., cleaved by a signal peptidase) by the host 
cell. For bacterial host cells that do not recognize and process the native polypeptide signal 
sequence, the signal sequence is substituted by a bacterial signal sequence selected, for example, 

15 from the group consisting of the alkaline phosphatase, penicillinase, lpp, or heat-stable 
enterotoxin II leaders. 

Both expression and cloning vectors contain a nucleic acid sequence that enables the 
vector to replicate in one or more selected host cells. Generally, in cloning vectors this sequence 
is one that enables the vector to replicate independently of the host chromosomal DNA and 

20 includes origins of replication or autonomously replicating sequences. Such sequences are well 
known for a variety of bacteria. The origin of replication from the plasmid pBR322 is suitable for 
most Gram-negative bacteria. 

Expression and cloning vectors also generally contain a selection gene, also termed a 
selectable marker. This gene encodes a protein necessary for the survival or growth of 

25 transformed host cells grown in a selective culture medium. Host cells not transformed with the 
vector containing the selection gene will not survive in the culture medium. Typical selection 
genes encode proteins that (a) confer resistance to antibiotics or other toxins, e.g., ampicillin, 
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neomycin, methotrexate, or tetracycline, (b) complement auxotrophic deficiencies, or (c) supply 
critical nutrients not available from complex media, e.g.* the gene encoding D-alanine racemase 
for Bacilli. One example of a selection scheme utilizes a drug to arrest growth of a host cell. 
Those cells that are successfully transformed with a heterologous protein homologue or fragment 
5 thereof produce a protein conferring drug resistance and thus survive the selection regimen. 

The expression vector for producing a protein or fragment thereof can also contains an 
inducible promoter that is recognized by the host bacterial organism and is operably linked to the 
nucleic acid encoding, for example, the nucleic acid molecule encoding the protein homologue or 
fragment thereof of interest. Inducible promoters suitable for use with bacterial hosts include the 

10 P-lactamase and lactose promoter systems (Chang et al, Nature 275:615 (1978); Goeddel et al, 
Nature 257:544 (1979); both of which are herein incorporated by reference in their entirety), the 
arabinose promoter system (Guzman et al, J. Bacterial 174:1116-112% (1992); the entirety of 
which is herein incorporated by reference), alkaline phosphatase, a tryptophan (trp) promoter 
system (Goeddel, Nucleic Acids Res. 8:4057 (1980); EP 36,776; both of which are herein 

15 incorporated by reference in their entirety) and hybrid promoters such as the tac promoter 
(deBoer et al, Proc. Natl Acad. Sci. (USA) 80:21-25 (1983); the entirety of which is herein 
incorporated by reference). However, other known bacterial inducible promoters are suitable 
(Siebenlist et al, Cell 20:269 (1980); the entirety of which is herein incorporated by reference). 
Promoters for use in bacterial systems also generally contain a Shine-Dalgarno (S.D.) 

20 sequence operably linked to the DNA encoding the polypeptide of interest. The promoter can be 
removed from the bacterial source DNA by restriction enzyme digestion and inserted into the 
vector containing the desired DNA. 

Construction of suitable vectors containing one or more of the above-listed components 
employs standard ligation techniques. Isolated plasmids or DNA fragments are cleaved, tailored 

25 and re-ligated in the form desired to generate the plasmids required. Examples of available 
bacterial expression vectors include, but are not limited to: the multifunctional E. coli cloning 
and expression vectors such as Bluescript™ (Stratagene, La Jolla, CA), in w hich, for example, 
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encoding an A. nidulans protein homologue or fragment thereof homologue, may be ligated into 
the vector in frame with sequences for the amino-terminal Met and the subsequent 7 residues of 
P-galactosidase so that a hybrid protein is produced; pIN vectors (Van Heeke and Schuster, J. 
Biol Chem. 26^:5503-5509 (1989), the entirety of which is herein incorporated by reference); 
5 and the like. pGEX vectors (Promega, Madison Wisconsin U.S.A.) may also be used to express 
foreign polypeptides as fusion proteins with glutathione S-transferase (GST). In general, such 
fusion proteins are soluble and can easily be purified from lysed cells by adsorption to 
glutathione-agarose beads followed by elution in the presence of free glutathione. Proteins made 
in such systems are designed to optionally include a heparin, thrombin, or factor XA protease 

10 cleavage sites so that the cloned polypeptide of interest can be released from the GST moiety at 
will. Proteins or polypeptides of the invention can be expressed as variants that facilitate 
purification. For example, a fusion protein to such proteins as maltose binding protein (MBP), 
glutathione-S-transferase (GST) or thioredoxin (TRX) are known in the art [New England 
BioLab, Beverly, Mass., Pharmacia, Piscataway, NJ, and InVitrogen, San Diego, CA]. The 

15 polypeptide or protein can also be a tagged variant to facilitate purification, such as with histidine 
or methionine rich regions (His-Tag; available from LifeTechnologies Inc, Gaithersburg, MD) 
that bind to metal ion affinity chromatography columns, or with an epitope that binds to a 
specific antibody (Flag, available from Kodak, New Haven, Conn.). An exemplary, non-limiting 
list of commercially available vectors suitable for fusion protein expression includes: pBR322 

20 (Promega); pGEX (Amersham); pT7 (USB); pET (Novagen); pIBI (IBI); pProEX-1 

(Gibco/BRL); pBluescript II (Stratagene); pTZ18R and pTZ19R (USB); pSE420 (Invitrogen); 
pVL1392 (Invitrogen); pBlueBac (Invitrogen); pBAcPAK (Clontech); pHIL (Invitrogen); pYES2 
(Invitrogen); pCDNA (Invitrogen); and pREP (Invitrogen). A number of other purification 
methods or means are also known and can be used. Reverse-phase high performance liquid 

25 chromatography (RP-HPLC), optionally employing hydrophobic RP-HPLC media, e.g.. silica 
gel, further purify the protein. Combinations of methods and means can also be employed to 
provide a substantially purified recombinant polypeptide or protein. Isolated plasmids or DNA 
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fragments are cleaved, tailored, and re-ligated in the form desired to generate the plasmids 
required. Examples of available bacterial expression vectors include, but are not limited to, the 
multifunctional E. coli cloning and expression vectors such as Bluescript Registered TM 
(Stratagene, La Jolla, CA), in which, for example, encoding an gene homologue or fragment 
5 thereof homologue, may be ligated into the vector in frame with sequences for the amino- 
terminal Met and the subsequent 7 residues of beta -galactosidase so that a hybrid protein is 
produced; pIN vectors (Van Heeke and Schuster / Biol Chem. 264: 5503-5509 (1989). The 
entirety of which is herein incorporated by reference); and the like. pGEX vectors (Promega, 
Madison Wis.) may also be used to express foreign polypeptides as fusion proteins with 

10 glutathione S-transferase (GST). In general, such fusion proteins are soluble and can easily be 
purified from lysed cells by adsorption to glutathione-agarose beads followed by elution in the 
presence of free glutathione. Proteins made in such systems are designed to include heparin, 
thrombin or factor XA protease cleavage sites so that the cloned polypeptide of interest can be 
released from the GST moiety at will. 

15 Suitable host bacteria for a bacterial vector include archaebacteria and eubacteria, 

especially eubacteria and most preferably Enterobacteriaceae. Examples of useful bacteria 
include Escherichia, Enterohacter, Azotobacter, Envinia, Bacillus, Pseudomonas, Klebsiella, 
Proteus, Salmonella, Serratia, Shigella, Rhizobia, Vitreoscilla and Paracoccus. Suitable E. coli 
hosts include E. coli W31 10 (American Type Culture Collection (ATCC) 27,325, Manassas, 

20 Virginia U.S.A.), E. coli 294 (ATCC 31,446), E. coli B and £. coli X1776 (ATCC 31,537). 

These examples are illustrative rather than limiting. Mutant cells of any of the above-mentioned 
bacteria may also be employed. It is, of course, necessary to select the appropriate bacteria 
taking into consideration replicability of the replicon in the cells of a bacterium. For example, E. 
coli, Serratia. or Salmonella species can be suitably used as the host when well known plasmids 

25 such as pBR322, pBR325, pACYC177, or pKN410 are used to supply the replicon. E. coli strain 
W31 10 is a preferred host or parent host because it is a common host strain for recombinant 
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DNA product fermentations. Preferably, the host cell should secrete minimal amounts of 
proteolytic enzymes. 

Host cells are transfected and preferably transformed with the above-described vectors 
and cultured in conventional nutrient media modified as appropriate for inducing promoters, 
5 selecting transformants, or amplifying the genes encoding the desired sequences. 

Numerous methods of transfection are known to the ordinarily skilled artisan, for 
example, calcium phosphate and electroporation. Depending on the host cell used, 
transformation is done using standard techniques appropriate to such cells. The calcium 
treatment employing calcium chloride, as described in section 1.82 of Sambrook et al., 
10 Molecular Cloning: A Laboratory Manual, New York: Cold Spring Harbor Laboratory Press, 
(1989), is generally used for bacterial cells that contain substantial cell-wall barriers. Another 
method for transformation employs polyethylene glycol/DMSO, as described in Chung and 
Miller (Chung and Miller, Nucleic Acids Res, 76:3580 (1988); the entirety of which is herein 
incorporated by reference). Yet another method is the use of the technique termed 
15 electroporation. 

Bacterial cells used to produce the polypeptide of interest for purposes of this invention 
are cultured in suitable media in which the promoters for the nucleic acid encoding the 
heterologous polypeptide can be artificially induced as described generally, e.g., in Sambrook et 
al, Molecular Cloning: A Laboratory' ManuaL New York: Cold Spring Harbor Laboratory Press, 
20 (1989). Examples of suitable media are given in U.S. Pat. Nos. 5,304,472 and 5,342,763; both of 
which are incorporated by reference in their entirety. 

In addition to the above discussed procedures, practitioners are familiar with the standard 
resource materials which describe specific conditions and procedures for the construction, 
manipulation and isolation of macromolecules (e.g., DNA molecules, plasmids, etc.), generation 
25 of recombinant organisms and the screening and isolating of clones, (see for example, Sambrook 
et al., Molecular Cloning; A Laboratory ManuaL Cold Spring Harbor Press ( 1989); Mailga et 
al, Methods in Plant Molecular Biology, Cold Spring Harbor Press (1995), the entirety of which 
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is herein incorporated by reference; Birren et al., Genome Analysis: Analyzing DNA, 1, Cold 
Spring Harbor, New York, the entirety of which is herein incorporated by reference), 
(j) Algal Constructs and Algal Transformants 

The present invention also relates to an algal recombinant vector comprising exogenous 
5 genetic material. The present invention also relates to an algal cell comprising an algal 

recombinant vector. The present invention also relates to methods for obtaining a recombinant 
algal host cell comprising introducing into an algal host cell exogenous genetic material. 

Exogenous genetic material is any genetic material, whether naturally occurring or 
otherwise, from any source that is capable of being inserted into any organism. Exogenous 
10 genetic material may be transferred into an algal cell. In a preferred embodiment the exogenous 
genetic material includes a nucleic acid molecule having a sequence selected from the group 
consisting of SEQ ID NO: 1 through SEQ ID NO: 621 or complements thereof. Another 
preferred class of exogenous genetic material are nucleic acid molecules that encode a protein 
having an amino acid selected from the group consisting of SEQ ID NO: 622 through SEQ ID 
15 NO: 626 or fragments thereof. 

The algal recombinant vector may be any vector which can be conveniently subjected to 
recombinant DNA procedures. The choice of a vector will typically depend on the compatibility 
of the vector with the algal host cell into which the vector is to be introduced. The vector may be 
a linear or a closed circular plasmid. The vector system may be a single vector or plasmid or two 
20 or more vectors or plasmids which together contain the total DNA to be introduced into the 
genome of the algal host. 

The algal vector may be an autonomously replicating vector, i.e., a vector which exists as 
an extrachromosomal entity, the replication of which is independent of chromosomal replication, 
e.g., a plasmid, an extrachromosomal element, a minichromosome, or an artificial chromosome. 
25 The vector may contain any means for assuring self-replication. Alternatively, the vector may be 
one which, when introduced into the algal cell, is integrated into the genome and replicated 
together with the chromosome(s) into which it has been integrated. For integration, the vector 
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may rely on the nucleic acid sequence of the vector for stable integration of the vector into the 
genome by homologous or nonhomologous recombination. Alternatively, the vector may contain 
additional nucleic acid sequences for directing integration by homologous recombination into the 
genome of the algal host. The additional nucleic acid sequences enable the vector to be 
5 integrated into the host cell genome at a precise location(s) in the chromosome(s). To increase 
the likelihood of integration at a precise location, there should be preferably two nucleic acid 
sequences which individually contain a sufficient number of nucleic acids, preferably 400 bp to 
1500 bp, more preferably 800 bp to 1000 bp, which are highly homologous with the 
corresponding target sequence to enhance the probability of homologous recombination. These 

10 nucleic acid sequences may be any sequence that is homologous with a target sequence in the 
genome of the algal host cell, and, furthermore, may be non-encoding or encoding sequences. 

The vectors of the present invention preferably contain one or more selectable markers 
which permit easy selection of transformed cells. A selectable marker is a gene, the product of 
which confers upon an algal cell resistance to a compound to which the algal would otherwise be 

15 sensitive. The compound can be selected from the group consisting of antibiotics, fungicides, 
herbicides, and heavy metals. The selectable marker may be selected from any known or 
subsequently identified selectable markers, including markers derived from algal, fungal, and 
bacterial sources. Preferred selectable markers can be selected from the group including, but not 
limited to, amdS (acetamidase), argB (ornithine carbamoyltransferase), bar (phosphinothricin 

20 acetyltransferase), hie (bleomycin binding protein), cat (chloramphenicol acetyltransferase), 

hygB (hygromycin B phosphotransferase), nat (nourseothricin acetyltransferase), niaD (nitrate 
reductase), neo (neomycin phosphotransferase), pac (puromycin acetyltransferase), pyrG 
(orotidine-5'-phosphate decarboxylase), sat (streptothricin acetyltransferase), sC (sulfate 
adenyltransferase), trpC (anthranilate synthase), and glyphosate resistant EPSPS genes. 

25 Furthermore, selection may be accomplished by co-transformation, e.g., as described in WO 
91/17243, herein incorporated by reference in its entirety. 
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A nucleic acid sequence of the present invention may be operably linked to a suitable 
promoter sequence. The promoter sequence is a nucleic acid sequence which is recognized by 
the algal host cell for expression of the nucleic acid sequence. The promoter sequence contains 
transcription and translation control sequences which mediate the expression of the protein or 
5 fragment thereof. 

A promoter may be any nucleic acid sequence which shows transcriptional activity in the 
algal host cell of choice and may be obtained from genes encoding polypeptides either 
homologous or heterologous to the host cell. Examples of suitable promoters for directing the 
transcription of a nucleic acid construct of the invention in an algal host are light harvesting 

10 protein promoters obtained from photosynthetic organisms, Chlorella virus methyltransferase 
promoters, CaMV 35 S promoter, PL promoter from bacteriophage A, nopaline synthase 
promoter from the Ti plasmid of Agrobacterium tumefaciens, and bacterial trp promoter. 

A protein or fragment thereof encoding nucleic acid molecule of the present invention 
may also be operably linked to a terminator sequence at its 3 f terminus. The terminator sequence 

15 may be native to the nucleic acid sequence encoding the protein or fragment thereof or may be 

obtained from foreign sources. Any terminator which is functional in the algal host cell of choice 
may be used in the present invention. 

A protein or fragment thereof encoding nucleic acid molecule of the present invention 
may also be operably linked to a suitable leader sequence. A leader sequence is a nontranslated 

20 region of a mRNA which is important for translation by the algal host. The leader sequence is 
operably linked to the 5' terminus of the nucleic acid sequence encoding the protein or fragment 
thereof. The leader sequence may be native to the nucleic acid sequence encoding the protein or 
fragment thereof or may be obtained from foreign sources. Any leader sequence which is 
functional in the algal host cell of choice may be used in the present invention. 

25 A polyadenylation sequence may also be operably linked to the 3' terminus of the nucleic 

acid sequence of the present invention. The polyadenylation sequence is a sequence which when 
transcribed is recognized by the algal host to add polyadenosine residues to transcribed mRNA. 
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The polyadenylation sequence may be native to the nucleic acid sequence encoding the protein or 
fragment thereof or may be obtained from foreign sources. Any polyadenylation sequence which 
is functional in the algal host of choice may be used in the present invention. 

The procedures used to ligate the elements described above to construct the recombinant 
5 expression vector of the present invention are well known to one skilled in the art (see, for 
example, Sambrook, 2nd ed., et al, Molecular Cloning, A Laboratory Manual Cold Spring 
Harbor, N.Y., (1989), herein incorporated by reference in its entirety). 

The present invention also relates to recombinant algal host cells produced by the 
methods of the present invention which are advantageously used with the recombinant vector of 

10 the present invention. The cell is preferably transformed with a vector comprising a nucleic acid 
sequence of the invention followed by integration of the vector into the host chromosome. The 
choice of algal host cells will to a large extent depend upon the gene encoding the protein or 
fragment thereof and its source. 

Algal cells may be transformed by a variety of known techniques, including but not limit 

15 to, microprojectile bombardment, protoplast fusion, electroporation, microinjection, and vigorous 
agitation in the presence of glass beads. Suitable procedures for transformation of green algal 
host cells are described in EP 108 580, herein incorporated by reference in its entirety. A suitable 
method of transforming Chlorella species is described by Jarvis and Brown, Curr. Genet. 19: 
317-321 (1991), herein incorporated by reference in its entirety. A suitable method of 

20 transforming cells of diatom Phaeodactylum tricomutum species is described in WO 97/39106, 
herein incorporated by reference in its entirety. Chlorophyll C-containing algae may be 
transformed using the procedures described in US 5,661,017, herein incorporated by reference in 
its entirety. 

The expressed protein or fragment thereof may be detected using methods known in the 
25 art that are specific for the particular protein or fragment. These detection methods may include 
the use of specific antibodies, formation of an enzyme product, or disappearance of an enzyme 
substrate. For example, if the protein or fragment thereof has enzymatic activity, an enzyme 
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assay may be used. Alternatively, if polyclonal or monoclonal antibodies specific to the protein 
or fragment thereof are available, immunoassays may be employed using the antibodies to the 
protein or fragment thereof. The techniques of enzyme assay and immunoassay are well known 
to those skilled in the art. 

5 The resulting protein or fragment thereof may be recovered by methods known in the arts. 

For example, the protein or fragment thereof may be recovered from the nutrient medium by 
conventional procedures including, but not limited to, centrifugation, filtration, extraction, spray- 
drying, evaporation, or precipitation. The recovered protein or fragment thereof may then be 
further purified by a variety of chromatographic procedures, e.g., ion exchange chromatography, 
10 gel filtration chromatography, affinity chromatography, or the like. 
Computer Readable Media 

The nucleotide or amino acid sequence provided in SEQ ID NO: 1 through SEQ ID NO: 
626, or fragment thereof, or complement thereof, or a nucleotide or an amino acid sequence at 
least 70% identical, preferably 90% identical even more preferably 99% or about 100% identical 

15 to the sequence provided in SEQ ID NO: 1 through SEQ ID NO: 626, or where appropriate 
complement thereof or fragments of either, can be "provided" in a variety of mediums to 
facilitate use. Such a medium can also provide a subset thereof in a form that allows a skilled 
artisan to examine the sequences. 

A further preferred subset of nucleic acid sequences is where the subset of sequences is 

20 two proteins or fragments thereof, more preferably three proteins or fragments thereof and even 
more preferable four proteins or fragments thereof. 

In one application of this embodiment, a nucleotide sequence of the invention can be 
recorded on computer readable media so that a computer-readable medium comprises one or 
more of the nucleotide sequences of the invention. As used herein, "computer readable media" 

25 refers to any medium that can be read and accessed directly by a computer. Such media include, 
but are not limited to: magnetic storage media, such as floppy discs, hard disc, storage medium 
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and magnetic tape; optical storage media such as CD-ROM; electrical storage media such as 
RAM and ROM; and hybrids of these categories such as magnetic/optical storage media. 

Any number of the sequences, or sequence fragments, of the nucleic acid molecules or 
proteins of the invention, or fragments of either, can be included, in any number of combinations, 
on a computer-readable medium. Specifically, any one or more of SEQ ID NO: 1-626, or where 
appropriate, complements thereof, can be included. 

A skilled artisan can readily appreciate how any computer readable medium can be used 
to create a machine or method comprising a computer readable medium having recorded thereon 
a nucleotide sequence of the invention. As used herein, "recorded" refers to a process for storing 
information on computer readable medium. A skilled artisan can readily adopt any method for 
recording information on computer readable medium to generate media comprising the 
nucleotide sequence information of the invention. A variety of data storage structures are 
available to a skilled artisan for creating a computer readable medium having recorded thereon a 
nucleotide sequence of the invention. The choice of the data storage structure will generally be 
based on the means chosen to access the stored information. In addition, a variety of data 
processor programs and formats can be used to store the nucleotide sequence information of the 
invention on computer readable medium. The sequence information can be represented in a 
word processing text file, formatted in commercially-available software such as WordPerfect or 
Microsoft Word, or represented in the form of an ASCII file, stored in a database application, 
such as DB2, Sybase, Oracle, or the like. A skilled artisan can readily adapt any number of data 
processor structuring formats (e.g., text file or database) in order to obtain computer readable 
medium having recorded thereon the nucleotide sequence information of the invention. 

By providing one or more of nucleotide sequences of the invention, a skilled artisan can 
routinely access the sequence information for a variety of purposes. Computer software is 
publicly available that allows a skilled artisan to access sequence information provided in a 
computer readable medium. The examples which follow demonstrate how software which 
implements the BLAST (Altschul et aL J- Mol Biol 275:403-410 ( 1990), the entirety of which 
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is herein incorporated by reference) and BLAZE (Brutlag et aL, Comp. Chem. 77:203-207 
(1993), the entirety of which is herein incorporated by reference) search algorithms on a Sybase 
system can be used to identify open reading frames (ORFs) within the genome that contain 
homology to ORFs or proteins from other organisms. Such ORFs are protein-encoding 
5 fragments within the sequences of the invention and are useful in producing commercially 

important proteins such as enzymes used in amino acid biosynthesis, metabolism, transcription, 
translation, RNA processing, nucleic acid and a protein degradation, protein modification and 
DNA replication, restriction, modification, recombination and repair. 

The invention further provides systems, particularly computer-based systems, which 

10 contain the sequence information described herein. Such systems are designed to identify 

commercially important fragments of the nucleic acid molecule of the invention. As used herein, 
"a computer-based system" refers to the hardware means, software means and data storage means 
used to analyze the nucleotide sequence information of the invention. The minimum hardware 
means of the computer-based systems of the invention comprises a central processing unit 

15 (CPU), input means, output means and data storage means. A skilled artisan can readily 

appreciate that any one of the currently available computer-based system are suitable for use in 
the invention. 

As indicated above, the computer-based systems of the invention comprise a data storage 
means having stored therein a nucleotide sequence of the invention and the necessary hardw are 

20 means and software means for supporting and implementing a search means. As used herein, 
"data storage means" refers to memory that can store nucleotide sequence information of the 
invention, or a memory access means which can access manufactures having recorded thereon 
the nucleotide sequence information of the invention. As used herein, "search means" refers to 
one or more programs which are implemented on the computer-based system to compare a target 

25 sequence or target structural motif w ith the sequence information stored within the data storage 
means. Search means are used to identify fragments or regions of the sequence of the invention 
that match a particular target sequence or target motif. A variety of known algorithms are 
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disclosed publicly and a variety of commercially available software for conducting search means 
are available can be used in the computer-based systems of the invention. Examples of such 
software include, but are not limited to, MacPattern (EMBL), BLASTIN and BLASTIX 
(NCBIA). One of the available algorithms or implementing software packages for conducting 
5 homology searches can be adapted for use in the present computer-based systems. 

The most preferred sequence length of a target sequence is from about 10 to 100 amino 
acids or from about 30 to 300 nucleotide residues. However, it is well recognized that during 
searches for commercially important fragments of the nucleic acid molecules of the invention, 
such as sequence fragments involved in gene expression and protein processing, may be of 
10 shorter length. 

As used herein, "a target structural motif," or "target motif," refers to any rationally 
selected sequence or combination of sequences in which the sequences the sequence(s) are 
chosen based on a three-dimensional configuration which is formed upon the folding of the target 
motif. There are a variety of target motifs known in the art. Protein target motifs include, but are 

15 not limited to, enzymatic active sites and signal sequences. Nucleic acid target motifs include, 
but are not limited to, promoter sequences, cis elements, hairpin structures and inducible 
expression elements (protein binding sequences). 

Thus, the invention further provides an input means for receiving a target sequence, a 
data storage means for storing the target sequences of the invention sequence identified using a 

20 search means as described above and an output means for outputting the identified homologous 
sequences. A variety of structural formats for the input and output means can be used to input 
and output information in the computer-based systems of the invention. A preferred format for 
an output means ranks fragments of the sequence of the invention by varying degrees of 
homology to the target sequence or target motif. Such presentation provides a skilled artisan 

25 with a ranking of sequences which contain various amounts of the target sequence or target motif 
and identifies the degree of homology contained in the identified fragment. 
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A variety of comparing means can be used to compare a target sequence or target motif 
with the data storage means to identify sequence fragments sequence of the invention. For 
example, implementing software which implement the BLAST and BLAZE algorithms (Altschul 
et al., J. MoL Biol 275:403-410 (1990)) can be used to identify open frames within the nucleic 
5 acid molecules of the invention. A skilled artisan can readily recognize that any one of the 

publicly available homology search programs can be used as the search means for the computer- 
based systems of the invention. 

Having now described the invention, the following examples are provided by way of 
illustration and are not intended to limit the scope of the invention, unless specified. 

10 Example 1 

Identification of Yeast HES1 

The yeast strain LPY9 (MATa, leu2, Ura3, his3) is grown overnight and inoculated into 
SD+ hul (histidine, uracil, leucine) media. Aliquots of the culture are treated with ketoconazole 
(an inhibitor of C-14a demethylase (P450i 4D m) enzyme) at lOug/ml, 50ug/ml, and lOOug/ml, 

15 corresponding to lOppm, 50ppm, and lOOppm, respectively. A sample of each is collected at 2, 
4, and 6 hours after treatment. Control samples treated with DMSO (dimethyl sulfoxide-solvent 
for ketoconazole) but not with ketoconazole are also collected. Total RNA from each sample is 
collected by conventional methods, such as a Zirconium/Silica bead binding and extraction 
method. The sequence content of each sample is analyzed and compared by hybridizing each of 

20 them to a number of yeast ORF sequences immobilized on a Nylon membrane in an array format. 

A similar comparison of a wild type yeast strain and a double mutant strain is made. The 
double mutant CJ517 (MATa, ergl 1::URA3, erg3::LEU2, leu2, ura3, his4) [ergl 1, erg3 double 
mutant] is compared to LPY9 after growth in both YPD and SD+hul media. Samples are 
collected at approximately 0, 2, 4, and 6 hours after inoculation. 
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Table 2, below, lists the RNAs in each sample whose abundance is effected by 
ketoconazole treatment or whose abundance differs between wild type and the double mutant 
strain. The table also lists the corresponding gene or sequence identifier for those RNAs. The 
RNAs are ranked by the ratio of either ketocanozole vs. control or mutant vs. control, using the 
5 ratio of 50ppm ketocanozole/control as a basis for comparison. 

Table 2* 



Seq. CJ-4hr/ 

Num. Clone ID ALIAS LP-4hr K-50/CK K-100/CK 

30 YOR237W (HES1) 134.648161 1417.6262 1358.1235 



31 YKL198C (PTK1) 68.5845326 11 1.1984 233.11762 

32 YLR465C - 97.9601498 104.52215 133.57826 

33 YMR129W (POM152) 5.10206225 82.813831 15.392788 



34 

35 
36 
37 

38 



39 
40 
41 
42 



43 

44 

45 



YBR284W - 

YKL158W - 

YOR083W - 
YOL095C 

YOR188W (MSB1) 



YBL109W 
YLR091W 

YNL106C (INP52) 
YDR213W - 



YBL004W 



4.92774291 60.027955 8.5359554 



11.6717854 59.827307 
31.7378598 51.606081 



75.220412 
42.301568 
21.834188 



3.60507866 49.740211 
2.19997209 42.446767 61.303817 



0.08616121 38.653463 75.964757 

17.5946744 38.325073 44.556481 

2.52986454 35.205536 17.376557 

18.2079478 32.136065 58.358612 



8.49387973 28.614573 28.645633 



YIR019C (MUCH 48.7538739 27.594853 137.778S5 
YJL182C - 2.53469593 26.891434 29.499298 



Gene Description 

Protein implicated in ergosterol 
biosynthesis, member of the 
KES 1/HES 1/OSH 1/YKR003W 
family of oxysterol-binding 
(OSBP) proteins 
Serine/threonine protein kinase, 
activator of low-affinity, low- 
capacity polyamine transport 
Protein of unknown function, 
questionable ORF 
Nuclear pore membrane 
glycoprotein, type II integral 
membrane protein with N-terminal 
region on pore side and C-terminal 
region in the cisternae 
Protein with similarity to AMP 
deaminase 

Protein of unknown function 
Protein of unknown function 
Protein with similarity to S. aureus 
DNA helicase PCRA 
Protein that may play a role in 
polarity establishment and bud 
formation 

Protein of unknown function 

Protein with similarity to 
transcription factors, has ZN[2]- 
CYS[6] fungal-type binuclear 
cluster domain in the N-terminal 
region 

Protein with similarity to members 
of the major facilitator superfamily 
(MPS) 

Glucoamylase I (alpha- 1 ,4-glucan 
glueosidase). extracellular enzyme 
Protein of unknow n function 
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Seq. 

Num. Clone ID 

46 YMR254C 



ALIAS 



47 
48 

49 

50 

51 

52 

53 
54 
55 
56 

57 

58 
59 

60 

61 



62 



63 



64 

65 

66 

67 
68 
69 



YDL134C {PPH21 ) 

YCR098C (GIT1) 

YPL150W 

YKL110C (KTI12) 

YER011W (TIR1) 

YDL024C 

YGR013W - 

YOR325W - 
YJR150C 

YDL126C (CDC48) 

YLR464W - 

YLR124W - 
YLR463C 

YMR297W (PRC1) 

YFL029C (CAK1) 



YER054C (GIP2) 



CJ-4hr/ 
LP-4hr 

0.19897977 

3.51284473 
2.27672091 

4.72964069 

19.7752946 

31.4723195 

3.96163383 

0.10491681 
47.3518002 
159.265973 
42.7590386 



K-50/CK 

26.633459 

22.849241 
21.746838 

21.633895 

21.085633 

20.454605 

20.381493 

20.364081 
20.211317 
19.793221 
19.0472 



YEL076C 
YGL176C 

YNR005C 

YML032C-A 

YGR190C 

YHR213W 



13.7918147 14.372278 
9,0823019 14.17085 



K-100/CK 

10.625738 

0 

24.724171 

34.40982 

16.303432 

17.935906 

30.488098 

0 

29,305064 
13,560079 
15.014024 



12.4297115 18.580843 36.516503 

0 13902212 18.351487 11.026125 

8 49721471 18.007814 29.811632 

6 20117404 17.995865 24.291751 

17.1104765 16.96782 44.352291 



2.14214491 16.442373 15 284537 



YER060W-A (FCY22) 2 61677424 15.768882 20.550953 



26 325282 
16 23816 



12.9230524 14.032659 13 011356 



6.92372404 13.847081 5.501802 
22.9885796 13.701633 42 22779 
17.3140804 13.267403 21010074 



Gene Description 

Protein of unknown function, 
questionable ORF 

Protein involved in inositol 
metabolism 

Serine/threonine protein kinase of 
unknown function 
Protein involved in resistance to 
kluyveromyces lactis killer toxin 
Stress-induced cell wall structural 
protein of the PAU1 family 
Protein with similarity to acid 
phosphatases 

Protein of unknown function 

Protein of the AAA family of 
ATPases, required for cell division 
and homotypic membrane fusion 
Protein with similarity to other 
subtelomerically-coded proteins 
Protein of unknown function 
Protein with similarity to other 
subtelomerically-coded proteins 
Carboxypeptidase Y (CPY) 
(YSCY), serine-type protease 
CDK-activating kinase 
(serine/threonine protein kinase) 
responsible for in vivo activation of 
CDC28P, also involved in spore 
wall formation 
GLC7P-interacting protein, 
possible regulatory subunit for the 
PP1 family protein phosphatase 
GLC7P 

Purine-cytosine permease with 
similarity to FCY2P, member of 
the purine/cytosine family of the 
major facilitator superfamily 
(MFS) 

Protein with similarity to other 
subtelomerically-eneoded proteins 
Protein with similarity to discopyge 
OMMATA CA++- channel alpha 1 
subunit protein B47447 
Protein of unknown function, 
questionable ORF 

Protein of unknown function 
Protein with similarity to the N- 
termmus of FLO IP and identical to 
YAR062P. probable pseudogene 
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Seq. 
Num. 

70 
71 
72 

73 
74 
75 



76 
77 
78 

79 
80 



82 
83 



84 

85 
86 



87 
88 
89 
90 



91 
92 
93 
94 
95 



96 



Clone ID 

YPL272C 
YBL100C 
YLR024C 

YMR102C 
YGR177C 
YFR034C 



ALIAS 



YNL282W 

YPL176C 

YMR015C 

YCR061W 
YHL030W 



YFR007W 
YOL067C 



YGR265W 
YGR293C 
YMR008C 



YOR140W 
YML034W 
YGR176W 
YOR014W 



(ATF2) 
(PH04) 



CJ-4hr/ 
LP-4hr 

24.778114 
4.8456884 
11.2130442 

4.61311719 

3.7081426 

14.8112083 



K-50/CK 

12.93877 

12.432421 

11.927798 

11.865115 
11.830167 
11.216073 



(ERG5) 



(ECM29) 



5.01708646 
7.30789994 
10.2651358 

4.07462743 
4.85453872 



10.943286 
10.664169 
10.313689 

10.291287 
10.275837 



(RTG1) 



2.58144987 
30.4142081 



10.102403 
10.027065 



(PLB1) 



(RTS1) 



22.156977 

51.4998515 

5.68517668 



6.33829162 
4.44092944 
4.56487981 
7.03478812 



9.9554618 
9.7686634 
9.602215 



9.2015298 
9.2011248 
8.8866015 
8.8422619 



YMR317W - 

YOR301W - 

YER1 19C-A - 

YOR385W - 

YGL156W (AMS1) 



25.9636363 
1 1.3702021 
8.9509545 
6.30021483 
11.9450551 



8.6834125 
8.6327901 
8.4086333 
8.3714543 
8.2732125 



K-100/CK 

1 1 647985 
16.193059 
17 73046 

16 370862 

12 555269 
20 844515 



13.050614 
18.424583 
9.3557963 

12.602668 
8.9818305 



YPL036W (PMA2) 7.19300398 10.171951 12.917306 



6.0105766 
27 36633 



5.672919 
8.066486 
11 309345 



12.881145 
15.848216 
12.598661 
11 590438 



11 973301 
13 589223 
6.8517264 
10 537348 
9.9190578 



YJL219W 



(HXT9) 



6.10093958 S. 1969449 14.860533 



Gene Description 

Protein of unknown function 
Protein of unknown function 
Protein with similarity to ubiquitin- 
protein ligase (E3) UBR1P 

Alcohol O-acetyltransferase 
Basic helix-loop-helix (BHLH) 
transcription factor required for 
expression of phosphate pathway, 
hyperphosphorylation by PHO80P- 
PH085P cyclin-dependent protein 
kinase complex causes inactivation 

Protein with similarity to SSP134P 
Cytochrome P450 (C-22 sterol 
desaturase) 

Protein of unknown function 
Protein possibly involved in cell 
wall structure or biosynthesis 
H+-transporting P-type ATPase of 
the plasma membrane, expression 
not detected under normal growth 
conditions 

Protein of unknown function 
Basic helix-loop-helix (BHLH) 
transcription factor involved in 
inter-organelle communication 
between mitochondria, 
peroxisomes, and nucleus 
Protein of unknown function 
Protein of unknown function 
Phospholipase B 

(lysophospholipase), releases fatty 
acids from lysophospholipids 

Protein of unknown function 
Protein of unknown function 
Protein serine/threonine 
phosphatase 2A (PP2A), B' 
regulatory subunit, involved in 
regulation of stress-related 
responses and the cell cycle 
Protein of unknown function 
Protein of unknown function 

Protein of unknown function 
Alpha-mannosidase, hydrolyzes 
terminal non-reducing alpha-D- 
mannose residues from alpha-D- 
mannosides 

Hexose transporter, member of the 
sugar permease family 



- Ill - 



Seq. CJ-4hr/ 

Num. Clone ID ALIAS LP-4hr K-50/CK 

97 YFL053W 

98 YNL279W 

99 YHR007C (ERG1 1) 5 51 1691 7.8623796 



101 
102 

103 



104 
105 

106 
107 



108 
109 



110 



3.55404282 8.1217569 
2 75618909 8.0041323 



YPL044C 

YOR030W (DFG16) 
YIL011W 



YNR069C 
YNL083W 

YJL020C 
YFL065C 



14.3161508 7.3694614 

2 06305137 7.3050052 

6 76775321 7.0352757 

13.5712126 7.0075571 



YNL329C (PAS8) 
YHR006W (STP2) 



3.75487269 6.7699941 
6.44648003 6.5480808 



YJL221C 



(FSP2) 



112 YLR379W 

113 YLR056W (ERG3) 



114 YMR319C (FET4) 

115 YBR045C (GIP1) 



16 YKLL47C 

17 YMR135W- - 
A 

IS YCR048W ( ARE 1 ) 



K-100/CK 

6.04425 

12.470971 

8.6320676 



100 YJL127C (SPT10) 4 01528284 7.8394427 10.096027 



2 61973879 7.8291062 4.5399013 
4 97362211 7.8182123 10.573213 

4 59710634 7.3954743 6.7112038 



14.104044 
15.674556 

5.3432583 
16.704839 



25.980939 
9.270283 



2.37104879 6.4365653 6.3055084 



111 YMR037C (MSN2) 6.80686734 6.4235969 7.6612989 



6.34038543 6.4227358 6.8206953 
0.03858406 6.2735601 5.191422 



3.5515443 6.2641804 8.194608 
5.88011982 6.254107 3.8135044 



4.54862611 6.2431328 10.034699 
15.3287997 6.1049555 4.611173 

9. 1 13705 IS 6.1039374 10.531291 



Gene Description 

Protein of unknown function 
Cytochrome P450 (lanosterol 
14alpha-demethylase), essential for 
biosynthesis of ergosterol 
Protein that amplifies the 
magnitude of transcriptional 
regulation at various loci 
Protein of unknown function 
Protein involved in invasive growth 
upon nitrogen starvation 
Protein with similarity to YIL176P, 
YIR041P and other members of the 
PAU1 family 

Protein of unknown function 
Protein of the mitochondrial carrier 
(MCF) family 

Protein of unknown function 
Protein with similarity to other 
subtelomerically-encoded proteins 
including YHL049P, YIL177P, 
YJL225P, YER190P, YHR218P, 
and YEL076P 

Protein involved in TRNA splicing 
and branched-chain amino acid 
uptake 

Protein with similarity to alpha-D- 
glucosidase (maltase) (FSP2 and 
YIL172C code for identical 
proteins) (YIL172C and YGR287C 
are nearly identical) 
Zinc-finger transcriptional activator 
for genes involved in the 
multistress response and genes 
regulated through SNF1P 
Protein of unknown function 
C-5 sterol desaturase, an iron non- 
heme oxygen-requiring enzyme of 
the ergosterol biosynthesis pathway 
Low-affinity Fe(II) transport 
protein 

GLC7P-interacting protein, 
possible regulatory subunit for the 
PP1 family protein phosphatases 
GLC7P 

Protein of unknown function 



Acyl-COA:sterol acyltransferase 
(sterol-ester synthetase) 
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Seq. 
Num. 

119 



120 



CJ-4hr/ 

Clone ID ALIAS LP-4hr K-50/CK K-100/CK 

YBR235W - 2 65851474 6.1026186 2.9854465 



121 



125 
126 



127 



128 



129 
130 



131 

132 
133 

134 
135 
136 



1 38 
139 



YJL160C 



5 14571281 6 0795621 6.0193217 



YNL287W (SEC21) 5 55890054 6.0742978 5.8985117 



YLR45SW - 28.2501296 5 9435623 

YLR121C - 4.04284936 5 9154936 

YLR347C (KAP95) 3.84797845 5 8759152 



YDL023C - 3.26329833 5 8589624 

YAL010C (MDM10) 5 34077952 5 807758 



YDR077W (SED1) 



YDR247W 



YBL011W 
YDL025C 



YAL013W (DEP1) 

YIL084C (SDS3) 
YJL213W 

YKR053C 
YNR042W - 
YCR072C 



YER0S6W (ILY1) 



4.6311951 

8.131848 

6.4154978 



4.7058193 
8.9195451 



3 30340602 5 6959082 5.9206909 



3 28497642 5.6793015 6.7651448 



YJE076W 
YLR072W 



3 59243122 5.650363 8.393684 
2 91426204 5.5604876 3 9241843 



8 793660S6 5.5463386 6 42501 

1 99582364 5.5430688 6 9074225 

7 09632444 5 498074 1 5 5079382 

5 37724431 5 4952302 6 4562635 

17.7115615 5 4798109 7.5527661 

5 34712592 5 4565375 4.5985045 



4 55278717 5.4449008 4.2437712 



11.4128793 5.427721*) 6.6898119 
5.192S7S56 5.4152299 7.2827024 



Gene Description 

Protein with similarity to human 
SLC12A1 gene tor which 
mutations are the cause of Bartter's 
Syndrome 

Protein with similarity to members 
of the PIR 1 P/HSP 1 50P/PIR3P 
family 

Coatomer complex gamma chain 
(gamma-COP) of secretory 
pathway vesicles, required for 
retrograde Golgi to endoplasmic 
reticulum transport 



Karyophenn-beta, acts to target 
proteins with nuclear localization 
(NLS) sequences to the nuclear 
pore complex 

Protein of unknown function 
Protein involved in mitochondrial 
morphology and inheritance, 
mutant has large spherical 
mitochondria that do not move into 
the bud 

Abundant cell surface glycoprotein, 
overexpression suppresses growth 
defect of ERD2 

Serine/threonine protein kinase 
with similarity to S. pombe RANI 
negative regulator of sexual 
conjugation and meiosis 
(GB:Z49701) 

Serine/threonine protein kinase 
with similarity to members of the 
NPR1 subfamily 
Regulator of phospholipid 
metabolism 

Suppressor of silencing defect 
Protein with weak similarity to 
nocardia aryldialkylphosphatase 

Protein of unknown function 
Protein with similarity to nuclear 
MRNA processing protein PRP4P. 
member of WD (WD-40) repeat 
family 

Serine and threonine dehydratase 
(anabolic), first step in isoleucme 
biosynthesis pathway 

Protein of unknow n function 
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Seq. 
Num. 

140 
141 
142 
143 

144 
145 
146 



147 
148 
149 
150 
151 

152 
153 
154 
155 



156 



ALIAS 

(YHH1) 



158 

159 
100 



161 
162 



163 
164 



Clone ID 

YDR301W 
YIL055C 
YEL076W-C 
YNR047W 



YGL211W - 
YGL012W (ERG4) 
YCL014W (BUD3) 



YBR106W 
YHR095W 
YEL010W 
YBR005W 
YPL183C 

YJL159W 
YBL065W 
YDL071C 
YGR197C 



(SNG1) 



CJ-4hr/ 
LP-4hr 

2.51614995 
2 0005314 
13.2032684 
4.44731559 

4.00934024 
4.57738431 
2 0970839 



5 74228482 
5 25923706 
3 39547744 
5 58242328 
3 25331232 

5 95901062 
5.04084137 
7.24874297 
6.43784806 



K-50/CK 

5.4121298 
5.3410327 
5.3265661 
5.3217828 

5.2957602 
5.2945042 
5.2855114 



5.2537051 
5.2434619 
5.2424909 
5.2283592 
5.2150911 

5.2095163 
5.1918263 
5.1844239 
5.17339 



YLL028W 



YDR430C 
YPL274W 

YMR261C (TPS 3) 



YOL118C 

YOR005C (DNL4) 



YNL332W - 
YDR069C (DOA4) 



5.4156341 5.0164198 
3.96385669 4.94376 



3.20265396 4.936553 
4 47086248 4.8815521 



3 33896215 4.8789948 
3 37810593 4.8769723 



K-100/CK 

7.0975432 
4.6542324 
8.0731092 
6.1790059 

5.5379668 

4.833773 

3.3963317 



9 2061479 

2 2066062 

3 9026395 
7 5591013 
4.034456 

5.0420867 
10.287249 
7.5184825 
7.8870948 



9.27382002 5.0519624 5 3421753 



157 YKR034W (DAL80) 3.91750209 5.0436172 7.2838566 



2.19022255 5.0401778 3.1989703 



6.1307085 
3 7501015 



5 7544219 
3.6707508 



4.7570682 
5.1947947 



165 



YOR009W 



59.4543494 4.8708102 



5.294S993 



Gene Description 

Protein of unknown function 

Serine/threonine protein kinase of 
unknown function 
Protein of unknown function 
Sterol C-24 reductase 
Protein localized at the neck 
filament ring required for axial 
budding, may provide a memory of 
the previous bud site 

Protein of unknown function 
Protein of unknown function 
Protein of unknown function 
Protein of unknown function, has 
WD (WD-40) repeats 

Protein of unknown function 
Protein of unknown function 
Probable transport protein that 
confers resistance to MNNG and 
nitrosoguanidme 
Member of major facilitator 
superfamily (MFS) multidrug- 
resistance (MFS-MDR) protein 
family 

GATA-type zinc finger 
transcriptional repressor for 
allantoin and 4-aminobutyric acid 
(GABA) catabolic genes 
Protein with similarity to Class I 
family of aminoacyl-TRNA 
synthetases 

Protein with similarity to GAP IP 
and other amino acid permeases 
Component of the trehalose-6- 
phosphate synthase/phosphatase 
complex, alternate third subunit 
with TLS1P 

Protein of unknown function 
ATP-dependent DNA ligase IV, 
involved in non-homologous DNA 
end joining 

Ubiquitin-specific protease 
(ubiquitin C-terminal hydrolase) of 
the 26S proteasome complex, 
involved in vacuole biogenesis and 
osmoregulation 

Protein with similarity to members 
of the PAU1 familv 
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Seq. 

Num. Clone ID ALIAS 

166 YMR035W (IMP2) 



CJ-4hr/ 
LP-4hr 



K-50/CK 



9.23409301 4 8492871 



K-100/CK 

5 7664813 



167 YER089C (PTC2) 

108 YJR018W 

169 YLR088W (GAA1 ) 



170 YOL163W 



171 YLR462W 

172 YLR098C (CHA4) 



173 YNR053C 



174 YDL246C 



175 YOL045W - 

176 YKL176C 

177 YJR114W 

178 YER091C (MET6) 



179 YHL049C 



180 YDR3S9W (SAC7) 

181 YMR202W (ERG2) 



182 YBL019W 

183 YGR2S7C 



184 YJL0S2W 

155 YHR098C 

156 YOR371C 



2.23920866 4 8455014 5 8687657 

5.54754057 4 8389334 4 4934937 
3.1893544 4 814116 4 0142997 



3.92239312 4 8014959 4 5124682 



3,32915042 4 7928645 7 1350658 
2.05280928 4 7564347 5 5866465 



2.55991235 4 7234659 3 8186389 



2.43826188 4 6757263 3 5757353 



3.55662236 4 672513 2 0538279 

3.32695888 4.6429893 5.4538239 

3.00664482 4 6389866 4.0045917 

6.67067887 4 6224571 2.9292597 



5.15537247 4 5637645 9.5066446 



3,89197011 4 5609599 4 3143109 
9.58572292 4 5446614 5 575174 



3.45990928 4 4694518 4 1655454 
10.2933872 4 4595137 9 7718104 



7.42175571 4 4522595 5.556901 
2.51284975 4 4353708 4.3716652 
2.47743776 4 42S9S04 5.2783501 



Gene Description 

Inner membrane protease of 
mitochondria, acts in complex with 
IMP IP but has different substrate 
specificity for removal of signal 
peptidase 

Protein serine/threonine 
phosphatase of the PP2C family 
Protein of unknown function 
Protein required for attachment of 
GPI anchor onto proteins, affects 
endocytosis 

Protein with weak similarity to 
pseudomonas putida phthalate 
transporter 

Protein of unknown function 
Zinc-finger protein required for 
activation of CHA1, has A ZN[2]- 
CYS[6] fungal-type binuclear 
cluster domain 

Protein with similarity to human 
breast tumor-associated 
autoantigen 

Protein with similarity to SOR1P 
(SOR1 and YDL246C code for 
nearly identical proteins) 
Serine/threonine protein kinase of 
unknown function 
Protein of unknown function 
Protein of unknown function 
Homocysteine methyl transferase 
(5-methyltetrahydropteroyl 
triglutamate-homocysteine 
methyltransferase), methionine 
synthase, cobalamin-independent 
Protein with similarity to other 
subtelomerically-encoded proteins 
including YER189P, YML133P, 
and YJL225P, coded from a 
subtelomerie Y' region 
GTPase-activating protein for 
RHOIP 

Sterol C8-C7 isomerase (C-8 sterol 
isomerase), enzyme of the 
ergosterol biosynthesis pathway 

Protein with similarity to alpha-D- 
glucosidase (maltase) (YGR2S7C 
IS nearly identical to FSP2 and 
Y1L172C) 

Protein of unknown function 
Protein of unknown function 
Protein of unknown function 
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Seq. 

Num. Clone ID ALIAS 

187 YDR530C (APA2) 

188 YKL119C (VPH2) 



189 YOR273C 



190 YPL042C (SSN3) 



CJ-4hr/ 

LP-4hr K-50/CK K-100/CK 

2.40849553 4.3993312 2.7389073 

0.16462534 4.3613346 0 



13.0544715 4.3469302 10.649131 



6.78272968 4.3344728 3.9578568 



191 YGR268C 

192 YPR011C 



4.77373538 4.3329069 5.2744105 
2.0077462 4.3123349 4.2742986 



193 YPL022W (RAD1) 



4.48327554 4.3036056 6.5285426 



194 YGL207W (SPT16) 5.34289635 4.3033021 3.5727713 



195 YGL167C (PMR1) 



196 YJR091C (JSN1) 



4.12359747 4.2628564 4,8141347 



4.56429439 4.2419881 4.7804157 



197 YDR238C (SEC26) 4.48641405 4.2179222 3 8109695 



198 YDL012C 

199 YDR044W (HEM 13) 



200 YGL114W 



201 YGL055W (OLH1) 



2.90930997 4.2158147 
14.9283272 4.2136787 



2,0519798 
3,4946018 



3.22707938 4.2023503 5.0073787 



2.29S75509 4.1923045 3.5992372 



Gene Description 

ATP adenylyltransferase II ( AP4A 
phosphorylase) 

Vacuolar H-ATPase (V-ATPase) 
assembly protein acting in the 
endoplasmic reticulum 
Protein with similarity to members 
of major facilitator superfamily 
(MFS) multidrug-resistance (MFS- 
MDR) protein family 
Cyclin-dependent serine/threonine 
protein kinase of the RNA 
polymerase II holoenzyme complex 
and Romberg's mediator (SRB) 
subcomplex 

Protein of unknown function 
Protein with similarity to human 
Grave's Disease carrier protein 
(SP:P16260) and to bovine 
homolog of Grave's Disease carrier 
protein (SP:Q01888) 
Component of the nucleotide 
excision repairosome, homolog of 
human XPF xeroderma 
pigmentosum gene product and the 
mammalian ERCC-4 protein 
General chromatin factor required 
for adequate expression of CLN 
and other genes 

CA++-transporting P-type ATPase 
of Golgi membrane involved in 
CA++ import into Golgi 
Protein that when overexpressed 
can suppress the hyperstable 
microtubule phenotype of TUB2- 
150 

Coatomer complex beta chain 
(beta-COP) of secretory pathway 
vesicles, required for retrograde 
transport from Golgi to 
endoplasmic reticulum 
Protein of unknown function 
Coproporphyrinogen III oxidase, 
oxygen-repressed, sixth step in 
heme-biosynthetie pathway 
Protein with similarity to S. pombe 
ISP4 protein, member of the major 
facilitator superfamily (MFS) 
Stearoyl-COA desaturase (delta-9 
fatty acid desaturase), required for 
synthesis of unsaturated fatty acids 
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Seq. CJ-4hi7 

Num. Clone ID ALIAS LP-4hr K-50/CK K-100/CK 

202 YDL088C (ASM4) 4.39685251 4 1757265 3.321034 



203 YKL171W - 2.64137608 4 1581147 8.2933538 

204 YPL190C - 5.94196213 4 1575162 3 202837 

205 YMR140W - 5.24432896 4 157179 5 4545409 

206 YBL005W (PDR3) 3.75060207 4 1449054 6 0827305 



207 YML032C (RAD52) 3.13968668 4 1330793 3 08321 15 



208 

209 
210 
211 
212 



213 



214 

215 
216 
217 

218 
219 
220 

^2 1 

")-)■> 



224 
225 
226 



YFR018C 

YGL125W 
YCR057C 
YBL044W 
YPL268W 



YLR339C 

YIL007C 

YIR007W 

YER114C 



(MET11 
(PWP2) 

(PLC1) 



YOR204W (DED1) 



YPL171C (OYE3) 



YOR203W 
YNL295W 
YEL042W 



(GDAI) 



(BOI2) 



YLR092W (SUL2) 
YEL060C (PRB1) 



YAL051W - 
YJR147W 

YOR3S6W (PHR1) 



5.28886874 4.1041589 6 4001917 



6.80542292 
3.34704165 
4.67885642 
2.90633764 



4.0762178 
4.0555292 
4.0526493 
4.0372127 



3 473713 
2 855052 



3.9900922 
3.9889367 



2 00067395 3.9856733 

4 12619065 3.9844972 

5 11957134 3.9770005 

3 71972069 3.9670484 



6 022371 17 3.9547891 
5 60961951 3.939317 



2 40553928 
2 05911726 
1 99823774 



3.9334781 
3.9267937 
3.9204076 



44382484 
3.4118145 
9.1998322 
2 1993847 



2.52920945 4.0291663 3 0830731 



4.94122983 4.0225239 0.2747214 



3 2232019 

2 1389666 

3 8058139 

3.506939 
2 318674 
5 0861072 



2 5967273 3.9643546 6 3042836 



4 5438793 
4 7370327 



4 5099518 

5 1956856 
5.7096569 



Gene Description 

Suppressor of temperature- 
sensitive mutations in POL3P 
(DNA polymerase delta) 
Serine/threonine protein kinase of 
unknown function 

Protein of unknown function 
Transcription factor related to 
PDR1P, contains a ZN[2]-CYS[6] 
fungal-type binuclear cluster 
domain in the N-terminal region 
Protein required for recombination 
and repair of X-ray damage, has a 
late function in meiotic 
recombination 

Protein with similarity to human 
glutaminyl-peptide cyclotransferase 



Protein of unknown function 
Phosphoinositide-specific 
phospholipase C ( 1- 
phosphatidylinositol-4,5- 
bisphosphate phosphodiesterase I ), 
produces diacylglycerol and 
inositol 1,4,5-trisphosphate 
ATP-dependent RNA heliease of 
dead box family involved in 
protein synthesis 
NAPDH dehydrogenase (old 
yellow enzyme), isoform 3 
Protein of unknown function 
Protein of unknown function 
Guanosine diphosphatase of Golgi 
membrane 

Protein of unknown function 
Protein of unknown function 
Protein with similarity to 
endoglucanase 

Protein with SH3 domain involved 
in bud formation, binds to BEM1P 
High-affinity sulfate transporter 
Protease B (YSCB) (PRB) 
(cerevisin), serine protease of the 
subtilisin family with broad 
proteolytic s pec i f i c i t y 



Deoxyribodipyrimidine photolyase. 
involved in light-dependent repair 
of pyrimidine dimers 
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Seq. 

Num. Clone ID ALIAS 

227 YCR037C (PH087) 



228 YOL100W 

229 YBL047C 



230 YAR014C 

231 YKL182W (FAS1) 



CJ-4hr/ 
LP-4hr 



°32 
233 

234 

235 
236 

237 

238 
239 



240 
241 
242 

243 



YLR331C 
YEL031W 

YHR078W 

YPL155C 
YNR074C 



YLR134W 
YKL067W 



YLR136C 
YDR443C 
YGL071W 

YBR293W 



2.81200613 3.8982171 

3 04632994 3 8854218 
3 40597241 3 88363 



2 53512963 3 823947 

3 75368336 3 8068781 



(SPF1) 



(KIP2) 



(PDC5) 
(YNK1) 



(TIS11) 
(SCAD 
(RCS1) 



K-50/CK K-100/CK Gene Description 

4.03 1 1 884 Member of the phosphate permease 
family of the major facilitator 
superfamily 

9 2592192 Serine/threonine protein kinase of 

unknown function 
5 2814809 Protein with similarity to 

eytoskeletal protein USOIP, 
PAN IP, and mouse tyrosine kinase 
substrate EPS 15 

4 406579 1 Protein of unknown function 

5 3493259 Fatty-aeyl-COA synthase, beta 

chain (contains acetyl transferase, 
enoyl reductase, dehydratase, and 
malonyl/palmitoyl transferase) 
3.94099666 3 795387 3.37 15843 Protein of unknown function 
7.77512435 3 7891074 4.357615 Protein with similarity to CA++- 

transporting ATPases 
2.2941334 3 7838221 4.6151917 Protein of unknown function, has 4 

potential transmembrane domains 
3 29502679 3 7807978 9 392792 Kinesin-related protein 
4.3061075 3 7638306 5 7991531 Protein with similarity to Bacillus 

subtilis nitrite reductase (NIRB) 
Alcohol dehydrogenase II, glucose- 
repressed 

Pyruvate decarboxylase isozyme 2 
Nucleoside diphosphate kinase, 
responsible for synthesis of all 
nucleoside triphosphates except 
ATP 

2 88004451 3.7255421 5 1711798 

2 75733315 3 7068432 69331513 

3 39203358 3.6963077 4 5310166 



YMR303C (ADH2) 4.56919214 3 7542967 3.1957867 



49450653 
4.49102455 



3 7528169 
3.7325797 



3 2704832 
3 6497934 



244 YMR324C 

245 YFL051C 

246 YBR276C (PPS1) 

247 YFL042C 

248 YPL263C (KEL3) 

249 YLR188W (MDL1 ) 



2 25740646 3.6840827 3 0384171 



3 33053542 3 6802526 2 8779503 



2.07690974 3 6611179 

2 35950244 3 0550406 

3.57726533 3 0509118 

4 50871509 3 0484792 

5 00498919 3.647S9S2 



Regulatory protein involved in 
IRON uptake 

Member of major facilitator 
superfamily (MFS) multidrug- 
resistance (MFS-MDR) protein 
family 

Protein with similarity to members 
of the YBL108P/YCR103P/ 
YKL223P family 
4 747620 1 Protein with similarity to FLO 1 P 

family of proteins 
3 4539593 Protein tyrosine phosphatase 

(PTPase) with dual specificity 
4.5694594 Protein of unknown function, has 

similarity to YHR080P 
3.8498382 Protein with similarity to KEL1P 

and KEL2P 
4.393632 1 ATP-binding cassette (ABC) 

superfamily member, equivalent to 
a "half-molecule" ABC protein 
plus an ATP-binding domain 
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Seq. 
Num. 

250 



251 



252 
253 
254 



256 



257 
258 



259 
260 

261 



263 
264 



265 
266 

267 
268 



CJ-4hr/ 

Clone ID ALIAS LP-4hr K-50/CK K-100/CK 

YPR021C - 2,21061647 3.6466639 3.2312479 



YKL138C (MRPL31) 3 22538649 3.6454084 4.0488722 

YNL148C ( ALF1 ) 3 33997835 3 6391378 5.2515594 

YLR302C - 10.5636377 3 6318383 0.3621924 

YBR298C (MAL31) 5.44502575 3 6302693 9.8311328 



255 YAR044W (OSH1) 4.12112011 3 624939 3.8839622 



YLR120C (YAP3) 6.14265883 3 6229845 4.4298562 



YGR134W - 2.8756723 3 6189405 1.9505784 

YMR088C - 3.01763425 3 574571 2.5742717 



YDR291W - 4 95353348 3 5637613 2.8803997 

YJR017C (ESS1) 2 981 18086 3 5587415 3 2208256 

YGL178W (MPT5) 4.28561965 3.558276 3 3338238 



262 YHR0S6W (NAM8) 2.63503306 3 556686 4 1441189 



YGR17SC 
YBL022C 



(PBP1) 2 95926792 3 5559294 
(PIM1) 4 1993836 3 5255118 



YJL083W 
YJR053W 

YJL175W 
YMR016C 



3 34026267 3.5131828 

2 12253S94 3.5096202 

6 11781731 3.5040684 

3 67893179 3.4720987 



3 6095103 
3 2518435 



5 6812601 
4 5401430 

3.7938536 
3.31 1 1279 



Gene Description 

Protein with similarity to proteins 
of the mitochondrial carrier (MCF) 
family <GB:Z49274) 
Mitochondrial ribosomal protein of 
the large subunit { YML3 1 ) 
Alpha-tubulin foldin, cofactor B 
Protein of unknown function 
High affinity maltoaseH+ 
symporter (maltose permease) 
member of the sugar permease 
family 

Protein implicated in ergosterol 
biosynthesis, member of the 
KES1/HES 1/OSH 1/YKR003W 
family of oxysterol-binding 
(OSBP) proteins 
Transcription factor of the basic 
leucine zipper (BZIP) family, one 
of eight members of a novel 
fungal-specific family of BZIP 
proteins 

Protein of unknown function 
Member of major facilitator 
superfamily (MFS) multidrug- 
resistance (MFS-MDR) protein 
family 2 

Protein with similarity to SGS1P 
and other DNA helicases 
Processing/termination factor, 
involved in transcription 
termination or 3'-end processing of 
pre-MRNA 

Protein required for high 
temperature growth, recovery from 
alpha-factor arrest, and normal 
lifespan of yeast cells 
Ul SNRNA-associated protein, 
essential for meiotic recombination 
and suppressor of mitochondrial 
splicing defects, has 3 RNA 
recognition (RRM) domains 
poly(A)-binding protein 
Serine protease required for 
intramitochondrial proteolysis and 
maintenance of respiratory 
function, related to E. coli ATP- 
dependent protease LA 
Protein with similarity to IRS4P 
Protein involved in efficiency of 
mating 

Protein of unknow n function 
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Seq. 

Num. Clone ID ALIAS 

269 YLL051C (FRE6) 

270 YJL212C 



271 YMR019W <STB4) 



272 YHL047C 



CJ-4hr/ 
LP-4hr 



K-50/CK 



273 

274 
275 
276 



278 
279 
280 
281 



282 
283 
284 



285 

286 
287 
288 
289 



YBR038W (CHS2) 



YLR023C 
YPL009C 
YGL008C 



YLR099C 

YDL057W 

YLR195C 



YAL005C 

YPL222W 
YJL056C 
YKR021W 
YPL119C 



(PMA1) 



277 YMR033W (ARP9) 



YLR153C (ACS2) 
YLL061W 

YNL192W (CHS1) 

YEL058W (PCM1) 



(NMT1) 



(SSA1) 



(DBP1) 



3.2576922 3 4414621 



3 45528019 3.3785446 

10.9366799 3.369477 

3 72575719 3.358192 

4 56623631 3.3482618 



4 7220964b 3.3290462 
3 21787484 3.316811 
3 4535546 3.3142347 



2.79485782 3,2974442 

2.36206747 3.2790296 

2.33705924 3.269552 

5.87199247 3.2464223 



K-100/CK 

4.4566151 



2.59520796 3.4643555 
4.42206990 3.458335 4.0764022 



3,397646 



3 02606918 3 4089434 3.007693 



2 03060756 3 3885338 2,8509884 



2.68880866 3 3876183 2.5555381 
5.28314415 3 3856037 0.8412905 
2.09210526 3 3844005 3.725269 



3 08586194 3.3800103 2.9005564 



3.1285812 
3.0795633 
3.7457248 
3.4456437 



2 9774757 
4.4492894 

3 1727409 



3 48582964 3,3068323 2 9388227 



2 9997551 

3 3129634 
2 936447 
2.197366 



Gene Description 

Protein with similarity to ferric 
reductase FRE2P 

Protein with similarity to S. pom be 
ISP4+ which is induced by sexual 
differentiation 

SIN3P-binding protein, has ZN[2J- 
CYS[6] fungal-type binuclear 
cluster domain in the N-terminal 
region 

Member of major facilitator 
superfamily (MFS) multidrug- 
resistance (MFS-MDR) protein 
family 

Chitin synthase II, responsible for 
primary septum disk 
Protein of unknown function 
Protein of unknown function 
H+-transporting P-type ATPase of 
the plasma membrane, activity is 
rate-limiting for growth at low pH 
Protein with similarity to actin and 
actin-related proteins ARP1P- 
ARP10P 

Acetyl-COA synthetase (acetate- 
COA Iigase) 

Protein with similarity to GAP IP 
and other amino acid permeases 
Chitin synthase I, has a repair 
function during cell separation 
Hexosephosphate mutase 
(phosphoacetylglucosamine 
mutase) (N-acetylglucosamine- 
phosphate mutase), converts N- 
acetyl-D-glucosamine 1 -phosphate 
to N-acetyl-D-glucosamine 6- 
phosphate 

Protein of unknown function 
Protein of unknown function 
N-myristoyltransferase, adds 
myristoyl group to N-terminal 
glycine of certain proteins 
Heat shock protein of HSP70 
family, cytoplasmic 
Protein of unknown function 

Protein of unknown function 
ATP-dependent RNA helicase of 
dead box family, suppressor of 
SPPS 1/DED1 
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Seq. 
Num. 

290 



291 
292 
293 
294 
295 
296 



297 



298 
299 

300 



301 



302 
303 
304 
305 
306 
307 
308 



309 
310 
311 
312 



313 
314 
315 
316 
317 



Clone ID 

VGL014W 



YER010C 

YJR151C 

YPL207W 

YER130C 

YNR065C 

YGL192W 



ALIAS 



YDR256C 
YDR208W 

YHR214W 



YAL028W 
YIR015W 
YMR308C 
YOR345C 



YPL193W 

YFR012W 

YPL205C 

YDR476C 

YCR052W 



(IME4) 



CJ-4hr/ 
LP-4hr 

3.11296478 



11.4713039 
41.4229667 
2.48831068 
2.01652303 
2.86361451 
2.89030953 



K-50/CK 

3.2294295 



3 2179542 
3 2130608 
3 2080219 
3 2075344 
3.2060768 
3 170381 



(CTA1) 
(MSS4) 



YLR249W (YEF3) 



YNL331C 

YPR115W 

YJLI78C 

YAR042W 

YDR015C 

YBL067C 

YHR072W 



(SWH1) 

(UBP13) 
(ERG7) 



(PSE1) 



(RSC6) 



3.44801501 

2.4843458 

2.60257256 

18.1940127 

0.09079169 

3.41427731 

3.5569619 



9.17485562 
2.80351347 
2.69422447 
5.73841888 



3.60415592 

3.31259823 

13.258257 

8.1273943 

2.2744649 



3 1185277 
3 1174643 
3 1121969 
3 0992362 
3 0861607 
3 0820393 
3 0809956 



3.0726043 
3.066482 
3.0659484 
3.0523183 



3.0500696 
3.0316711 
3.0208358 
3.0155987 
3.01 1243S 



K-100/CK 

3.6382821 



2.3807886 
4.5216913 
2.9864022 
3 3332725 
7 3945002 
6 4784105 



YMR047C (NUP1I6) 2.56622055 3 1702234 4.7742052 



4.54027942 3 158248 8.0093186 
2.61164524 3 154316 3.0423151 

4.54013428 3 1513793 5 6325011 



3.59397167 3 1445334 2 631954 



6.5303136 
2.5667848 
2.7705734 
6.2488302 
0.3097629 

2 4717411 

3 4311189 



3 8109858 
3 4328314 
2.6409014 
2 2898958 



2 8450987 
0 

0 7999155 
3.8636781 
2.6017436 



Gene Description 

Protein with pumilio repeats that is 
involved with MPT5P in 
relocalization of SIR3P and SIR4P 
from telomeres to the nucleolus 
Protein of unknown function 
Protein of unknown function 
Protein of unknown function 
Protein of unknown function 
Protein with similarity to PEP IP 
Positive transcription factor for 
IME1 and IME2, mediates control 
of meiosis by carrying signals 
regarding mating type (A/alpha) 
and nutritional status 
Nuclear pore protein (nucleoporin) 
of the GLFG family, may be 
involved in binding and 
translocation of nuclear proteins 
Catalase A (peroxisomal) 
Potential PI P 5-kinase, multicopy 
suppressor of STT4 mutation 
Protein of unknown function 
(YAR066W and YHR214W code 
for identical proteins) 
Translation elongation factor EF- 
3 A, member of ATP-binding 
cassette (ABC) superfamily 
Probable aryl-alcohol reductase 
Protein of unknown function 
Protein of unknown function 
Protein of unknown function 
Protein of unknown function 
Ubiquitin C-terminal hydrolase 
Lanosterol synthase, carries out 
complex cyclization step of 
squalene to lanosterol in ergosterol 
biosynthesis pathway 
Protein of unknown function 
Subumt of RNase P 

Deoxycytidyl transferase involved 
in mutagenic translesion DNA 
synthesis 

Protein of unknown function 
Protein of unknown function 
Protein of unknown function 
Protein of unknow n function 
Component of abundant chromatin 
remodeling complex (RSC) 
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Seq. 

Num. Clone ID ALIAS 

318 YGL022W (STT3) 



CJ-4hr/ 
LP-4hr 



K-50/CK 



321 
322 



323 

324 

325 
326 
327 
328 
329 



330 
331 



333 



3.64275733 3.0050118 



319 YMR109W 

320 YHR032W 



YLR236C 

YOR337W (TEA1) 



YFR055W 

YHR212C 

YLR001C 

YOR034C 

YPR076W 

YKL183W 

YBR004C 



YJR071W 
YCR084C 



(TUP1) 



19.0544656 3 0044499 
9.30722933 2.9855823 



2.6190617 2 9810987 
2.13152473 2 9790715 



2.35867872 2 9771983 

4.01639255 2 9769438 

2.77031036 2 9663037 

3.38439363 2 9543526 

3.86182393 2.9410933 

2.9718977 2 9334031 

3.05485559 2.9257736 



3.39019477 2.924417 
2.40138822 2 9219843 



YKL148C (SDH1) 



334 YER044C 

335 YLR045C (STU2) 

336 YPL226W 



337 YHR161C 

338 YJR109C (CPA2) 



339 YGR250C 



K-100/CK 

3 8854905 



5 6886658 
4 5560581 



3 7681402 

4 8228581 



3 0622139 
4.4451423 

2 7628132 

2 5499862 

3 2728075 
5 2561547 
2 8905869 



1 768982 
3.2718264 



332 YFR030W (MET10) 33.6060485 2 9138815 2 0879079 



2.72554507 2.9036242 2 5317298 



340 YLR149C 



3.6669641 2.9002716 2 6807728 
2.16969039 2.8946579 2.9923107 

2.45263084 2.8885678 2.5557944 



2.86345744 2.8873374 2,9469349 
4.31426739 2.8803515 3,1263529 



2.20914388 2.8752914 3.8774955 



3.39994503 2.8694003 4.6627573 



Gene Description 

Oligosaccharyltransferase subunit, 
member of a complex of eight ER 
proteins that transfers core 
oligosaccharide from dolichol 
carrier to Asn-X-Ser/Thr motif 

Protein of unknown function, 
member of the major facilitator 
superfamily(MFS) 

TY1 enhancer activator of the 
GAL4P-type family of DNA- 
binding proteins 
Protein with similarity to E. coli 
cystathionine beta-lyase 
Protein identical to 
Y AR060P/R A A 1 9P 
Protein of unknown function 

Protein of unknown function 
Protein of unknown function 
Protein expressed between 3 and 6 
hours after transfer to sporulation 
medium 

Protein of unknown function 
General repressor of transcription 
(with SSN6P), member of WD 
(WD-40) repeat family 
Assimilatory sulfite reductase 
subunit, flavin-binding (alpha) 
subunit, part of the sulfate 
assimilation pathway 
Succinate dehydrogenase 
(ubiquinone) flavoprotein (FP) 
subunit, converts succinate + 
ubiquinone to fumarate + ubiquinol 
in the TCA cycle 
Protein of unknown function 
Component of the spindle pole 
body 

Protein with similarity to members 
of the ATP-binding cassette (ABC) 
superfamily 

Carbamoylphosphate synthase 

(glutamine-hydrolyzing) arginine- 

specitlc, large chain 

Protein of unknown function, has 

three RNA recognition (RRM) 

domains 

Protein of unknown function 



Seq. 

Num. Clone ID ALIAS 

341 YCL057W (PRD1) 



342 YLR114C 



344 



345 
346 



347 



348 
349 

350 
351 

352 

353 
354 



355 
356 
357 

358 



CJ-4hr/ 

LP-4hr K-50/CK K-100/CK 

3.49569406 2.8641379 2.7495149 



2.27233205 2.8496505 1.8650501 



343 YML075C (HMG1) 2.71708812 2 8491957 3.2059005 



YLR397C (AFG2) 



YJR019C (TES1) 
YBL008W (HIR1) 



YGL062W (PYC1) 



YPL244C 
YGL001C 

YMR302C (PRP12) 
YPL160W (CDC60) 



YLL024C 



YPL114W 
YPL221W 
YJR137C 



(SSA2) 



YEL077C 

YMR205C (PFK2) 



YKL164C (PIR1) 



359 YCL037C (SR09) 

360 YHR082C (KSP1) 

361 YPR074C 



2.56801854 2 8469125 2.7385515 



4.07777555 2 8303235 2.0724897 
7.24580603 2 8284713 2.8866813 



2.649771 



2 8279558 3.1059191 



3.43385233 2 82181 19 3.3274479 

3.91981575 2 8214816 1.9852785 

2.92335545 2 8146501 2.7190981 

2.25327101 2 8142723 1.7426948 

4.09160949 2 8142088 2.4784071 



3.20718793 2.8098429 
2.27470363 2 8050429 



4.16484234 2 7962162 
4.08515832 2 7886642 



(ECM17) 26.5435466 2 787597 



3.9054119 
2.2843952 



1.717967 
3.960997 
2.0763181 



2.11125363 2 7864791 2.3925674 

8.35007693 2 7855748 2.393588 

2.14499054 2 7799591 3.4962633 

3.19760669 2.771 1859 2.476508 



Gene Description 

Proteinase YSCD, saceharolysin, 
contains zinc metal loendoprotease 
motif HEXXH 

Protein with weak similarity in the 
C-terminus to drosophila 
melanogaster bicaudal-D protein 
3-hydroxy-3-methylglutaryl- 
coenzyme A reductase 1, rate 
limiting enzyme for sterol 
biosynthesis, converts HMG-COA 
to mevalonate 

Protein of the AAA family of 
ATPases, has similarity to 
mammalian valosin-containing 
protein (VCP) 
Acyl-COA thioesterase 
Histone transcription inhibitor, 
required for periodic repression of 
3 of the 4 histone gene loci and for 
autogenous repression of HTA1- 
HTB 1 locus by H2A and H2B 
Pyruvate carboxylase 1, converts 
pyruvate to oxaloacetate for 
gluconeogenesis 
Protein of unknown function 
Protein with similarity to nocardia 
SP. cholesterol dehydrogenase 

Leucyl-TRNA synthetase, 
cytoplasmic 

Heat shock protein of HSP70 
family, cytoplasmic 

Phosphofructokinase beta subunit, 
part of a complex with PFK1P 
which carries out key regulatory 
step in glycolysis 
Protein of unknown function 
Protein of unknown function 
Putative sulfite reductase 
(ferredoxin) 

Protein required for tolerance to 
heat shock, member of the 
PIR 1 P/HSP 1 50P/PIR3P family 
Suppressor of YPT6 null and 
RH03 mutations 
Serine/threonine kinase that 
suppresses PRP20 mutant when 
overproduced 
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Seq. 

Num. Clone ID ALIAS 

362 YBR184W (MEL1) 



303 YOL157C 

304 YFL066C 



305 YLL029W 

306 YJL198W 



307 YDR088C (SLU7) 



CJ-4hr/ 
LP-4hr 



K-50/CK 



5 06354303 2.7711448 



2 70064964 2.7668777 
2 94443276 2 753026 



2.22657399 2 7389102 
2.98124683 2 7343513 



309 YIL078W (THS1) 



370 YGL113W - 

371 YMR086W - 

372 YGL233W (SEC 15) 

373 YGL144C 

374 YOR137C 

375 YJR143C (PMT4) 



376 YBR289W (SNF5) 



377 YNL240C 



37S YML013W - 

379 YKL16SC 

380 YGL151W <NUT1) 

381 YNL197C (WHI3) 



382 YMR192W 



K-100/CK 

3.5340388 



3.6204284 
3.5848427 



3 1468025 

4 6395823 



2 07293165 2 7339627 2 6876744 



308 YJR132W (NMD5) 3.2005363 2 7333821 3.208398 



3.31778832 2 7330794 1.6123939 



2.33404789 2.7249323 3.3810122 
2 69384376 2 7191747 2 9840404 
2.61433498 2.7141295 2 7961427 



2.26752066 2 7069494 2 6236889 
3.14249753 2 7031211 4.9526236 
2 80130312 2.6954799 2 4879264 



2 00671327 2 6881295 3 038619 



5 13894557 2 685901 3 523963 



3 62672833 2.6831604 2 9292996 
2 43589311 2.0791837 3 1S96257 
2 47823061 2.6787971 2 5683618 



2 51493336 2.0704555 3 4233233 



2 18376269 2.6732126 2 7489187 



Gene Description 

Alpha-galactosidase { melibiase ), 
converts melibiose into galactose + 
glucose, converts melibiose to 
galactose and glucose 
Probable alpha-glucosidase 
Protein with similarity to other 
subtelomerically-encoded proteins 
including YIL177P, YHL050P, 
and YER190P 
Protein of unknown function 
Protein with strong similarity to 
PH087P, member of the phosphate 
permease family of the major 
facilitator superfamily (MFS) 
Pre-MRNA splicing factor 
affecting 3' splice site choice, 
required only for the second 
catalytic step of splicing 
Member of the karyopherin-beta 
family, possibly involved in 
nuclear transport 
Threonyl-TRNA synthetase, 
cytoplasmic, member of Class II 
family of aminoacyl-TRNA 
synthetases 

Protein of unknown function 
Protein of unknown function 
Component of exocyst complex 
required for exocytosis 
Protein of unknown function 
Protein of unknown function 
Mannosyltransferase (dolichyl 
phosphate-D-mannose:protein O- 
D-mannosyltransferase), involved 
in initiation of O-glycosylation 
Component of SW1/SNF global 
transcription activator complex, 
acts to assist gene-specific 
activators through chromatin 
remodeling 

Protein with similarity to 
kluyveromyces MARX. LET1 
protein 

Protein of unknown function 

Protein that affects expression of 
HO 

Protein involved in regulation of 
cell size, has 1 RNA recognition 
(RRM) domain 

Protein with similarity to mouse 
TBC1 protein 
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Seq. 
Num. 

383 
384 



385 

386 
387 



389 

390 
391 
392 
393 



3<>7 

398 

399 
400 

401 
402 

403 
404 



405 
406 

407 



Clone ID 

YAL038W 
YEL075C 



YHR219W 



YIL137C 

YBL081W 

YOR171C 
YPL237W 

YHR142W 
YLL012W 

YFR025C 
YGR240C 



ALIAS 

(CDC19) 



CJ-4hr/ 
LP-4hr 



K-50/CK 



2 63679951 2 6714535 
5 12225893 2 6632638 



YJL069C 
YLR125W 

YML035C (AMD1) 



2 65731007 2 6517254 
6 28348756 2 642933 



2 23864371 



6401405 



YMR165C (SMP2) 2 58642399 2 631041: 



YDL223C 
YLR138W 
YAR020C 

YLR337C (VRP1) 



394 YLR060W (FRS1) 

395 YLL013C 

396 YIR003W 



3 16684859 2 6240147 

2 69090586 2.6158483 

3 79173888 2.6111125 
6 57336326 2 6027011 



2 61550639 2 5992071 
2 93447915 2.5901954 
2.41363594 2 5863745 



(SUI3) 



(HIS2) 
(PFK1) 



2 27968603 

2 34404421 

3.59659097 
2.5966981 

3,52383057 
3.25020683 

2.51 12362 
2.24103063 



2 5792356 

2 573205 

2 5718305 
2.5628077 

2.5597096 
2.550591 

2.5457991 
2.5388997 



YPL101W 

YOR127W (RGA1) 



4.18961695 2.5351198 
3.85804733 2.5316649 



K-100/CK 

2.7525692 
3.778537 



3 76398139 2 6619567 4.020734: 



2.6568606 
3.1335402 

1 6690608 

3 3572604 

2 2340877 

2 9821754 
2 0234222 

3 7504037 



1 9335426 
4 1297767 

2 874494 

1 9624104 

3 2079939 

2 4035974 
2 5479456 

2.9887896 
2.7451737 

2 8789156 
2 4938739 



2.6201803 
2 5697341 



YBROSSC (POL30) 2.53837 IS 2.5276319 4 0628861 



408 YBR295W <PCA1) 

409 YCL044C 



4.16669535 2.525791 1 1221384 
2.35958836 2.519608 3 263571 



Gene Description 

Protein with similarity to other 
subtelomerieally-encoded proteins 
including YHL049P, YIL177P, 
and YJL225P 

Protein with similarity to other 
subtelomerieally-encoded proteins 
Protein of unknown function 
Protein of unknown function 
AMP deaminase, converts AMP to 
IMP and ammonia 
Protein whose deletion causes 
increased plasmid stability 
Protein of unknown function 



Proline-rich protein verprolin, 
involved in cytoskeletal 
organization and cellular growth 
Phenylalanyl-TRNA synthetase, 
alpha subunit, cytoplasmic 
Protein with similarity to 
drosophila pumilio protein 
Protein with similarity to E. coli 
and Bacillus subtilis mind, has 
potential coiled-coil region 
Protein with similarity to 
aminopeptidases 
Protein with Yl c /c identity to 
drosophila L not protein 

Translation initiation factor 
EIF2beta subunit 
Protein of unknown function 
Protein with similarity to human 
triacylglycerol lipase 
Histidinol phosphatase 
Phosphofructokinase alpha subunit, 
part of a complex with PFK2P 
which carries out A key regulatory 
step in glycolysis 
Protein of unknown function 
RHO-type GTPase-aetivating 
protein (GAP) for CDC42P 
Proliferating cell nuclear antigen 
(PCNA), required for DNA 
synthesis and DNA repair 
P-type copper-transporting ATPase 
Protein of unknown function 



Seq. 

Num. Clone ID ALIAS 

410 YBR110W (ALG1) 



411 YGR175C (ERG1) 



412 YLR116W 

413 YCR068W - 

414 YJR105W 

415 YKL157W (APE2) 



CJ-4hr/ 

LP-4hr K-50/CK K-100/CK 

2.23384099 2 5141215 3 2250999 



6.02726287 2 5103577 18132661 



2.98116702 2 5079761 3 9707409 

3.32107678 2 4920381 3 6811994 

2.20476096 2 4908887 1 7029385 

2.18209838 2 4866194 2 093134 



416 YFR009W (GCN20) 2.637821 18 2 4859544 2 1613378 



417 YDR211W (GCD6) 2.22567451 2.4835485 18240639 



418 YAR060C 

419 YJL1S7C (SWE1) 

420 YDR387C 

421 YDR251W (PAMi) 

422 YJL172W (CPS1) 

423 YMR277W (FCP1) 

424 YDL047W (SIT4) 



4.88485967 2 482682 



425 YML117W 



426 YHR039C-A - 

427 YLL003W (SFI1) 



4 6114571 



2.01161328 2 4809757 2 6294797 

2.3225348 2 4746572 3 0481024 

2.09471237 2 4744652 2 3344613 

2.4464951 2 473092 2 228723 

2.51675116 2.466666 2.2346158 

2.40214S63 2 4572974 2 7529791 



L2473701 2 4482108 2.8054783 



2.49103418 2 4469729 1.7368373 
3.03031 186 2 4407012 2.2685901 



Gene Description 

Beta-mannosyltransferase involved 
in N-glycosylation (transfers MAN 
from GDP-MAN to DOL-PP- 
GLCNAC2) 

Squalene monooxygenase 
(squalene epoxidase), enzyme of 
the ergosterol biosynthesis pathway 

Protein of unknown function 
Protein with similarity to 
ribokinase 

Aminopeptidase II (YSCII), plays a 
nutritional role in releasing leucine 
from peptides externally cleaved at 
leucine 

Component of a protein complex 
required for activation of GCN2P 
protein kinase in response to amino 
acid starvation, member of ATP- 
binding cassette (ABC) 
superfamily 

Translation initiation factor EIF2B 
(guanine nucleotide exchange 
factor), 81 KDA (beta) subunit 
Protein identical to YHR212P, has 
a predicted mitochondrial transit 
peptide 

Serine/tyrosine dual-specificity 
protein kinase able to 
phosphorylate CDC28P on tyrosine 
and inhibit its activity 
Protein with similarity to ITR1P 
and ITR2P 

Coiled-coil protein and multicopy 
suppressor of loss of PP2A (genes 
PPH2 1, PPH22, and PPH3) 
GLY-X carboxypeptidase YSCS, 
involved in nitrogen metabolism 
TFIIF-interacting component of the 
C-terminal domain phosphatase 
Protein serine/threonine 
phosphatase involved in cell cycle 
regulation, member of the PPP 
family of protein phosphatases and 
related to PP2A phosphatases 
Protein of unknown function, 
contains an ATP/GTP-binding site 
motif A (P-loop) 

Protein of unknown function 
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Seq. 

Num. Clone ID ALIAS 

428 YKR048C (NAP1) 



429 YOR197W - 

430 YEL046C (GLY1) 



431 YJL029C 

432 YOR233W (KIN4) 

433 YOR299W (BUD7) 

434 YHR218W - 

435 YGL026C (TRP5) 



436 YJL017W 

437 YNL161W - 

438 YOR141C (ARP8) 



439 YAL042W 

440 YGR270W (YTA7) 

441 YBR119W (MUD1) 

442 YDR052C (DBF4) 



CJ-4hr/ 

LP-4hr K-50/CK 

3.02222721 2 4404483 



2 876457 1 1 2 438206 
2,40664526 2 4369367 



2.36823878 2.43429 



444 YDR285W (ZIP1) 

445 YJL047C 

446 YKL101W (HSL1) 



K-100/CK 

3.002619 



1.9784081 
2.6795853 



2.3384644 



3.52231883 2 4312627 3.0678435 

2.16058794 2 4312223 2.9585581 

2.37694362 2 4297245 4.1990669 

3.92053304 2 4267316 2.4752996 



2.745014 2 4179146 2.8613495 
4.7525671 2 4161417 2.324762 

5.68817037 2 4122798 1.7395537 



2 84377325 2 4057529 3.7961408 

2.68803581 2.4056715 1.945755 

2.83912216 2 4051525 1.3987642 

6.85835185 2 4036928 1.5834976 



443 YEL069C (HXT13) 2 6902010S 2 4013304 3.6711431 



8.03633767 2.3921886 0.2216256 

2.8960182 2,3885065 2.0814157 
4.2235071 2.37S02S6 2.44S5279 



Gene Description 

Nucleosome assembly protein that 
plays a role in assembly of histones 
into oc tamer, required tor full 
expression of CLB2P functions 
Protein of unknown function 
Protein required for glycine 
prototrophy in SHMT1 SHMT2 
double mutant 

Protein of unknown function, has 
similarity to C. elegans 
hypothetical protein T05G5.8 
Serine/threonine protein kinase 
related to KINlPand KIN2P, 
catalytic domain is most related to 
SNF1P 

Protein required for bipolar 
budding pattern 
Protein with similarity to other 
subtelomerically-encoded proteins 
including YHR219P and 
YFL065P, probable pseudogene 
Tryptophan synthase, last (fifth) 
step in tryptophan biosynthesis 
pathway 

Protein of unknown function 

Serine/threonine protein kinase of 

unknown function 

Protein with similarity to actin and 

actin-related proteins ARP1P- 

ARP10P 

Protein of unknown function, has 2 
potential transmembrane domains 
Protein with similarity to members 
of the AAA family of ATPases 
Ul SNRNP A protein (SNRNA- 
associated protein) with 2 RNA 
recognition (RRM) domains 
Regulatory subunit for CDC7P 
protein kinase, required for Gl/S 
transition 

Protein with strong similarity to 
hexose transporters, member of the 
sugar permease family 
Structural protein of the 
synaptonemal element central 
element, has predicted coiled-coil 
domain 

Protein with similarity to clathrm 
heavy chain in one domain 
Serine/threonine protein kinase that 
interacts genetically with histone 
mutations 
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Seq. 






CJ-4hr/ 






Num. 


Clone ID 


ALIAS 


LP-4hr 


K-50/CK 


K-100/CK 


447 


YIL143C 


(SSL2) 


2 16202858 


2 3668818 


1 9944618 




YRR 1 ST 




1 1 SOJ-ISSJ 

J 1JU*T J./O'T 


2 3653 1 83 


2 517131 


449 


YER189W 


- 


2 65287612 


2 3630614 


5 1724275 


450 


YLR194C 




3 1 1287981 


2 3617044 


2 923307 


451 


YGR160W 




2 13853989 


2 3577684 


1.8132562 


452 


YGR258C 


(RAD2) 


2 06944636 


2 3572245 


2 1751698 


453 


YGR162W 


(TIF4631) 


2 28099935 


2 3554791 


1 7039222 


dS A 
4 j4 


V TROlfSP 
1 JKujOv^ 




1 7 1 0">7 n 0J. 


7 IS-lfUS^ 

_ JJ"tU*tJ — 


S 07 1 7C01 


455 


YGR124W 


(ASN2) 


3.37829988 


2 3505148 


24742017 


456 


YDL180W 




2 20643197 


2 3467843 


1 8047293 


457 


YDR266C 




3 29383065 


2 3411759 


2 3118864 


458 


YAR073W 


- 


7 67257484 


2.3325262 


1.6890618 


459 


YPL048W 


(CAM1) 


2.18528771 


2.3294863 


3.3106924 


460 


YEL030W 


(ECM10) 


1 99868799 


2.3236082 


2.3835153 


1 a i 






A 1 ICIAOO^ 


1 'Xl^'l 1 sc. 




1 AO 
4o_! 


1 J KU I U W 


(Mt, I J ) 


0 1A1CJA1A 


7 11 77 1 J. 7 
_ Jl / Z 14 / 


1 .J 1 1 jUo4 


463 


YER110C 


(KAP123) 


3 02732098 


2 3160572 


1 8042941 


JAJ 
404 


I ULUO^ VV 


/pi TCO \ 


7 1 7S 1 7J.~>7 


") 1 1 7 470 J. 




465 


YPL184C 




3.3404012 


2 3122475 


2 2563666 


1 A A 
400 


I VjK_ 34 V\ 


/ PKin 1 
{ £Ll\\J I ) 




7 ioosmo 


1 0OSJ.7S6. 


1 A 7 

40 / 


vn i new/ 
i 1L 1 U6 VV 






7 lOCASA 1 


~> S7R">077 


1 AC 

40o 


VP\D 7 COW' 
I UrOooVV 


(K V o 10/ ) 


~> ^11 1 S7 1 1 
_ J>4 1 1 j / 1 j 


7 10SCS 1 c 


~> AO 1 7 s 7 7 


469 


YNL323W 




2 29668952 


2,3038327 


2.06459S5 


470 


YBL076C 


(1LS1) 


2 31635893 


2.3036041 


1.7634202 


471 


YLR217W 




2.57939547 


2.2S59565 


1.6611523 


472 


YGR294W 




8.48668724 


2.2857763 


1.7132102 


473 


YDL070W 




2.16064033 


2.2854538 


3.7599153 




Gene Description 

DNA helicase component of RN A 
polymerase transcription initiation 
factor TFIIH (factor B) 

Protein with similarity to 
subtelomerically-encoded proteins 
including YIL177P, YHL049P, 
and YJL225P 

Protein of unknown function 
Protein of unknown function 
Structure-specific single-stranded 
DNA endonuclease of the 
nucleotide excision repairosome 
MRNA CAP-binding protein 
(EIF4F) 150K subumt 
Possible ubiquitin-protein ligase 
(E3) 

Asparagine synthetase (L-aspartate: 
L-glutamine amidoligase [AMP- 
forming]), ASNIP and ASN2P are 
isozymes 

Protein of unknown function 
Protein of unknown function 
Protein with strong similarity to 
PUR5P, may be an inosine-5'- 
monophosphate dehydrogenase 

Protein possibly involved in cell 
wall structure or biosynthesis 
Protein with similarity to 
neurospora crassa O- 
succinylhomoserine (thiol)-lyase 
ATP-sulfurylase (sulfate 
adenylyltransferase) 
Karyopherin-beta, involved in 
nuclear import of ribosomal 
proteins 

Pseudouridine synthase 
Protein of unknown function 
Enolase 1 (2-phosphoglycerate 
dehydratase), converts 2-phospho- 
D-glycerate to 

phosphoenolpyruvate in glycolysis 
Protein of unknown function 
Protein with A SH3 domain that 
affects actin distribution and 
bipolar budding 

Protein with similarity to YCX1P 
Isoleucyl-TRNA synthetase 
Protein of unknown function 
Protein of the PAU1 family 
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Seq. 
Num. 

474 

475 



476 
477 



478 
479 

480 
481 

482 
483 



484 

485 



486 



487 



488 
489 

490 
491 



492 

493 
494 



Clone ID ALIAS 

YOL044W - 
YGL145W (TIP20) 



YLR044C (PDC1) 
YNR013C 



YML049C 
YDR221W 

YMR135C 
YKR001C 



YGL179C 



YOL017W 
YHR189W 



YGR262C 



CJ-4hr/ 

LP-4hr K-50/CK 

2.15373467 2.2849315 

4.18903489 2.2829973 



2.21772333 2.2774972 
2.0080141 2.2770842 



(VPS1) 



YLR413W - 
YDR122W (KIN1) 



2.08547393 2 2761395 

2 53153283 2 2731861 

4 75727106 2 2636411 

2 48277065 2 2630712 

2.80009402 2 2629262 

2.0434064 2 2623436 



YIL154C (IMP2") 2 216207 2 2548739 
YKL068W (NUP100) 2.2598003 2.2529093 



YNL208W - 
YHR041C (SRB2) 



YBR229C (ROT2) 2 45186053 2.2034499 



83275613 2.2029336 



K-100/CK 

2 1736446 
1 6161221 



1 9431592 

2 4893728 



2 0329879 
1 8131644 

4 3609747 

1 5678763 

2 3695083 
2 3432635 



2 2466776 
2 7012733 



YHR190W (ERG9) 2 81318531 2 2475123 1 7238705 



4 83814707 2.2398396 3.8786749 



3.01741322 2.2303862 2 2459064 

2 0021212 2.22911 2 2289936 

3.64860898 2.2181817 2 5247363 

2 27216109 2.2178582 2.4847273 



Y PRO SOW (TEF1) 2.50402057 2.2115095 1.8587879 



2 1844611 

1 9251756 



Gene Description 

Cytoplasmic protein that interacts 
physically with SEC20P, required 
for ER to Golgi transport 
Pyruvate decarboxylase isozyme 1 
Protein with similarity to PH087P 
and YJL198P, member of the 
phosphate permease family of the 
major facilitator superfamily 
(MFS) 

Protein with similarity to the beta 
subunit of human glucosidase II 
Protein of unknown function 
Vacuolar sorting protein, member 
of the dynamin family of GTPases 
Protein of unknown function 
Serine/threonine protein kinase, 
related to KIN2P and S. pombe 
KIN1 

Nuclear pore protein (nucleoporin) 
of the GLFG family, may be 
involved in binding and translation 
of proteins during 
nucleocytoplasmic transport 
Squalene synthetase (farnesyl- 
diphosphate farnesyltransferase), 
acts at a branch point in the 
isoprenoid biosynthesis pathway 
Serine/threonine protein kinase 
with similarity to ELM IP and 
KIN82P 

Protein of unknown function 
Putative peptidyl-TRNA hydrolase 
(PTH) 

Protein of unknown function 
Component of the RNA 
polymerase II holoenzyme and 
Romberg's mediator (SRB) 
subcomplex 

Translation elongation factor EF- 
1 alpha (TEF1 and TEF2 code for 
identical proteins) 
Catalytic (alpha) subunit of 
glucosidase II 

Protein with similarity to apple tree 
calcium/calmodulin-binding 
protein kinase PIRJQ2251 
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Seq. 

Num. Clone ID 

495 YER144C 



496 



497 
498 
499 

500 
501 



502 



503 



504 



505 
506 
507 
508 



509 



510 

511 
512 
513 



YDR264C 



VLR427W 
YLR374C 
YMR092C 

YDR294C 
YMR296C 



ALIAS 

(UBP5) 



(AKR1 ) 



CJ-4hi7 
LP-4hr 



K-50/CK 



(AIP1 



(LCB1) 



YKR039W (GAP1) 



YDR422C (SIP1) 



YMR080C (NAM7) 



YBL106C 
YEL043W 

YBR222C (FAT2) 
YDR004W (RAD57) 



YHR174W (EN02) 



YER043C (SAH1) 

YKR012C 
YOL007C 

YMR220W (ERGS) 



3 381260S9 2.1994294 



2 24985411 
2.26923061 



2.1938243 
2.1927227 



2.2241966 2 1917074 

2.20085342 2.1899557 
2 1334221 2 1891645 



2 38138747 2 1809814 

3.44125375 2.1784956 

5 13679804 2.1781103 

2 26389978 2 1754266 



2 38714668 2 1702816 



3.73200717 2 1669937 

2.4135S469 2 1555775 

3.17872347 2 1529948 

2.68S16133 2 1489328 



K-100/CK 

2.7106303 



3 13151279 2.1983967 2 7516536 



2 5059695 
2 6395044 
2 1939749 

2 3333139 
1 9030014 



1.99105648 2 1881751 12556866 



2 62373247 2 1870761 2 0836347 



2 82340116 2 1828046 2 1714828 



2 7798326 
2.8076042 
3.0936394 
2.076582 



9697417 



1 6246235 

1 1414615 
1.2712945 
2.0693924 



Gene Description 

Ubiquitin-specific protease 
(ubiquitin C-terminal hydrolase), 
homologous to DOA4P and human 
TRE-2 

Ankyrin repeat-containing protein 
that has an inhibitory effect on 
signaling in the pheromone 
pathway 

Protein of unknown function 
Protein of unknown function 
Actin interacting protein, has 4 
WD (WD-40) repeats 

Component of serine C- 
palmitoyltransferase, first step in 
biosynthesis of long-chain base 
component of sphingolipids 
General amino acid permease, 
proton symport transporter for all 
naturally-occurring L-amino acids, 
4-aminobutyric acid (GABA), 
ornithine, citrulline, some D-amino 
acids, and some toxic analogs 
Multicopy suppressor of SNF1 , 
related to GAL83P/SPM1P and 
SPM2P 

Protein involved with NMD2P and 
UPF3P in decay of MRNA 
containing nonsense codons 

Protein of unknown function 
Peroxisomal AMP-binding protein 
Component of recombinosome 
complex involved in meiotic 
recombination and recombinational 
repair, with RAD55P promotes 
DNA strand exchange by RAD5 IP 
reeombinase 

Enolase 2 (2-phosphoglycerate 
dehydratase), converts 2-phospho- 
D-glycerate to 

phosphoenolpyruvate in glycolysis 
Adenosylhomocysteinase (S- 
adenosylhomocysteine hydrolase) 
Protein of unknown function 

Phosphomevalonate kinase, 
converts mevalonate-5-phosphate 
to mevalonate pyrophosphate, 
involved in isoprene and ergosterol 
biosynthesis pathways 
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Seq. 

Num. Clone ID ALIAS 

514 YDR062W (LCB2) 



515 YAL048C 



516 
517 



518 



YBL111C 
YJL108C 



YJL141C 



(YAK1) 



519 YJL102W (MEF2) 

520 YDL174C (DLD1) 

521 YMR011W (HXT2) 

522 YLRI29W (DIP2) 

523 YML008C (ERG6) 



524 YGL245W 



525 YGL024W - 

526 YHL027W (RIM101) 

527 YGR2S1W (Y0R1) 



528 YIL175W 

529 YHL019C (APM2) 

530 YAL019W (FUN30) 

531 YGL112C (TAF60) 



CJ-4hr/ 

LP-4hr K-50/CK K-100/CK 

2.54448949 2.1430094 19627647 



5 02313141 2.1384748 4 221132 



2.17340644 
4,56646166 



2.1313903 
2.1302533 



7.25080973 
3 361 15373 
2.51872662 



2.1 188378 
2.1 126408 
2 1091092 



2 30162026 2.1065078 



2.67631735 2 1046757 
2.57210755 2 1033157 

4.18259907 2 0935061 



2 10803474 2 0859355 

1 99S670S 2 0848729 

5 19927199 2.0806959 

2 21463331 2.0765265 



3 8030907 
2 8609713 



2.80000608 2,1277388 2 8291776 



2.08592026 2.1220696 1.6098307 



2.28050309 2.1220649 2 4305801 



1.6420019 
2 0325416 
1 7889829 



4267053 



1 4610387 

2 5927892 

2 3634092 



2 4771166 

3 0618718 

1 7340212 

2 1891308 



Gene Description 

Subunit of serine C- 
palmitoyltransferase, first step in 
sphingolipic biosynthesis, and 
suppressor of calcium-sensitivity of 
CSG2 

Protein with weak similarity to 
RAS IP, RAS2P, and other GTP- 
binding proteins of the RAS 
superfamily 

Protein of unknown function, 
contains 8 potential transmembrane 
domains 

Serine/threonine protein kinase, 
negative regulator of cell growth 
acting in opposition to CAMP- 
dependent protein kinase A 
Mitochondrial translation 
elongation factor, promotes GTP- 
dependent translocation of nascent 
chain from A-site to P-site of 
ribosome 

D-lactate dehydrogenase 
(cytochrome), (D-lactate 
ferricytochrome C oxidoreductase) 
(D-LCR), mitochondrial 
High-affinity hexose transporter, 
member of sugar permease family 
DOM34P-interacting protein, has 
WD (WD-40) repeats 
S-adenosylmethionine delta-24- 
sterol-C-methyltransferase, carries 
out methylation of zymosterol as 
part of the ergosterol biosynthesis 
pathway 

Glutamyl-TRNA synthetase, 
member of the Class I aminoacyi 
TRNA synthetase family 
Protein of unknown function 
Zinc-finger protein involved in 
induction of IME1 
0 1 i go m y c i n -r e s i s t a nee f ac to r , 
member of the ATP-binding 
cassette (ABC) superfamily 

Clathrin-assoeiated protein (AP) 
complex, medium subunit 

Component of TAF(II) complex 
(TBP-associated protein complex) 
required for activated transcription 
by RNA polymerase II 
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Seq. 

Num. Clone ID 

532 YNL218W 



ALIAS 



CJ-4hr/ 
LP-4hr 



K-50/CK 



533 
534 
535 
536 



537 
538 



539 
540 



YML058C-A - 

VOL156W (HXT11) 

YGR218W (CRM1) 

YGR296W - 



YLR176C 

YDL229W (SSB1) 



YER034W 

YKR050W (TRK2) 



541 YIL113W 

542 YCR023C 



543 YMR069W - 

544 YAL020C (ATS1) 



545 YNL256W 

546 YMR124W 



547 YOR162C 

548 YOR353C 



2 28887465 2 0761 



K-100/CK 

1 6749939 



217.969407 2 0723568 

5 12784966 2 0709411 2 2192118 

2 32581989 2 0675233 1 5505702 

3 15948331 2 0664535 3 7402619 



2 54329087 2 0627475 1 4892288 
5 21935107 2 0615889 2 0067653 



2.57654853 2 0562947 1.9025056 
2 23638067 2 056259 4 703529 



7.07756282 2 0539759 2.28618 
2 01851078 2 0520751 2.2109695 



4 45745957 2 0520592 0 

3 02597511 2 050802 2 0781706 



3 16308725 2 045577 18697374 
2 65610298 2 0431312 2 2988806 



2 4478098 2 0361958 2 1075035 
2 20965265 2 0220258 17471747 



549 YPL028W (ERG10) 2 86559138 2 0185951 16989337 



550 YIL114C (POR: 



551 YDL029W (ACT 2) 

552 YDL143W (CCT4) 



553 YPL267W 

554 YOL105C 



2 24322702 2 0152799 2 367678 



2.07186888 2 0140172 1810394 
2 3041307 2 0128325 1 6478427 



2 06501413 2 0119076 1.6761922 
2 79225712 2.0026061 2.23737 



Gene Description 

Protein with similarity to E. coli 
DNA polymerase III gamma and 
TAU subunits 
3.2869214 

Low- affinity glucose permease 
Exportin, beta-karyopherin 
Protein with similarity to other 
subtelomerically-encoded proteins 
including YER190P (YPL283 and 
YGR296W code for identical 
proteins) 

Heat shock protein of HSP70 
family involved in the translational 
apparatus 

Protein of unknown function 
Potassium transporter of the 
plasma membrane, moderate 
affinity, member of the potassium 
permease family of the major 
facilitator superfamily 
Dual-specificity protein 
phosphatase 

Member of major facilitator 
superfamily (MFS) multidrug- 
resistance protein family 2 
Protein of unknown function 
Protein with similarity to human 
RCC1 protein, suppressor of 
mutations in alpha tubulin 
Protein with similarity to bacterial 
dihydropteroate synthase 
Protein of unknown function, has 
potential coiled-coil region 
(GB:Z49273) 

Protein with weak similarity to 
adenylate cyclases 
Acetyl-COA acetyltransferase 
(acetoacetyl-COA thiolase), first 
step in mevalonate/sterol pathway 
Outer mitochondrial membrane 
porin (voltage-dependent anion- 
selective channel) 

Component of chaperonin- 
containing T-compiex (TCP ring 
complex, TRIO, homologous to 
mouse CCT4 

Protein of unknow n function 



Seq. 
Num. 

555 



556 



557 
558 
559 
5(>0 



561 
502 



563 
564 
565 

566 
567 



568 



569 
570 

571 
572 
573 
574 

575 

576 



577 



578 



Clone ID 

YML004C 



YMR266W 



YPL194W 
YOR152C 
YDR242W 
YFL054C 



YAR068W 
YAL001C 



YLR454W 
YDL020C 
YMR225C 

YJR038C 
YDR380W 



ALIAS 

(GLOl) 



CJ-4hr/ 

LP-4hr K-50/CK 

2.19630894 2.0015677 



2,47393267 1.991 1* 



YGR248W 
YER058W 

YBR039W 
YDL102W 
YJR153W 
YMR188C 

YBR244W 

YDR523C 

YDL031W 



(AMD2) 



2.87006368 
2 74047761 
8 28951711 
7 43223753 



0.4961465 
0.4915256 
0.4819032 
0.4793582 



3.24259317 0.479021 
(TFC3) 2.94740587 0.4742746 



(SON1) 
(MRPL44) 



5.72921213 
2,27378766 
0,19372389 

9.06373624 
0.1136124 



0.4716283 
0.4591519 
0.4430311 

0.4422872 
0.4417559 



(SOL4) 
(PET117) 

(ATP3) 
(CDC2) 



0.17664863 
0.18996331 

0.18787084 
17.5853214 
3.65551445 
0.20743995 



0,4293198 
0.4289442 

0.4197886 
0.4169873 
0.4116558 
0.4113817 



(SPS1) 



0.16093632 0.4035137 
10.8815611 0.4014712 



K-100/CK 

1 7985136 



1 727182 



1 5346869 
0 2221023 
0 9215489 
0 6136582 



1 2297001 
1 1915566 



1.641906 

0.8208918 

0.4019617 

4.1801655 
0.8241167 



YKL170W (MRPL38) 0.20347891 0.4296401 0 4533368 



0 4062793 
0 4202828 

0 2837245 
0 0258767 
0 6086987 
0 3381207 

0 3438917 

0 3371725 



2.0561223 0.3989968 0 6791327 



Gene Description 

Glyoxalase I, converts 
methylglyoxal and glutathione into 
S-D-lactoylglutathione 
Protein of unknown function, 
probable integral membrane 
glycoprotein 

Protein of unknown function 
Protein with similarity to amidases 
Protein with similarity to FPS1P 
and YPR192P, member of MIP 
family of transmembrane channels 
Protein with similarity to ICWP 
protein 

RNA polymerase transcription 
initiation factor TF1IIC (TAU), 138 
KDA subunit 

Protein of unknown function 

Mitochondrial ribosomal protein of 
the large subunit (YMR44) 
Protein of unknown function 
Protein with similarity to pyruvate 
decarboxylase, pyruvate oxidase, 
acetolactate synthase (large 
subunit), and other enzymes that 
require thiamine pyrophosphate 
Mitochondrial ribosomal protein of 
the large subunit (YML38) (E. coli 
L14), belongs to the L 14 family of 
pro kary otic ribosomal proteins 
Protein of unknown function 
Protein involved in assembly of 
cytochrome oxidase 
Fl -gamma ATP synthase 



Protein with similarity to 30S 
ribosomal proteins (SI 7) 
Protein with similarity to 
glutathione peroxidase 
Serine/threonine protein kinase 
involved in middle/late stage of 
meiosis 

Protein with similarity to RNA 
helicases of dead/DEAH box 
family 



YER109C (FLQ8B) 2.33584341 0.3826502 



0529509 
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Seq. 






CJ-4hr/ 






Num. 


f| nnP in 


A! IA^ 


I P Jhr 
L,r -*4Iir 


rv-.?u/ r\. 


ix ~ i ii v/ v.. rv 


579 


YIR017C 


(MET28) 


2.97658904 


0.3775372 


0.3008895 


580 


YDL016C 




3.9417341 


0 374232 


0.2672688 


581 


YIR028W 


(DAL4) 


2,5006493 


0 3741716 


3.0010653 


582 


YOR124C 


(UBP2) 


2.8382974 


0 3622925 


0.3859773 


583 


YBL108W 




0.1900473 


0 3575329 


0,5467376 


584 


YDR259C 




5.62713626 


0 3429355 


0.2335082 


585 


YDR253C 


(MET32) 


2.86314943 


0 3397175 


0,3279043 


580 


YJL196C 


(ELOl) 


0 17135352 


0 3378086 


0.3752547 


JO / 


I UK 1 4 iv_ 




n no i ^ossj. 




0 1 0Q^0">0 

U. 1 U7J J <L J 


JOO 


1 DKUUVL 


i V A P 1 ^ 


J.UlOl U J> O 






589 


YOR314W 


- 


2 65430513 


0 2917342 


0.3312621 


son 


I ULUUO VV 




0 1 I SSO I 76 


0 ^6S4 1 OS 


0, 1521 109 


591 


YPL136W 


- 


2.17418921 


0.2530647 


3.1708409 


sen 


1 ULuj4L 




0 Ml I 70S 

U HI 1 / 7 J 


n ">s">dfno 


\J S> 1 *- J *+ J J 




YT T> 1 A '"J W 
I LK 1 OJ. W 




A 1 "1 AO AAA 7 


U —J UJOj 


U UOJIJ J ~ 


594 


YMR193C-A 




3 34099753 


0.2354896 


0 3596816 


595 


YMR146C 


(TIF34) 


5 0351989 


02248204 


0 7193538 


'U 


v ft n l ~> w 

I rLU i — \ > 




7 1 OJ. V-uiOS 


0 ">">^S ^71 


1 52 15902 


5^7 


YER096W 




7.21258235 


0 1766673 


0.4170679 


598 


YNR071C 


- 


2.01488788 


0 1446196 


0.0535063 


599 


YLR419W 


_ 


0.20769335 


0 1102431 


0.9141258 


OUU 


V'L r I | AsP 




1 1 72 1 1 A 7 7 1 




j .Uc00"0 


601 


YLR142W 


(PUT1) 


2.2907881 


0 0854218 


0.6671487 


002 


YDL239C 




7.81000565 


0 041773S 


0.347901 


003 


YHR137W 


(AR09) 


0.07724918 


0.0347684 


0.0703134 


004 


YDR374C 




17.25276 


0 


4.6059079 



Gene Description 

Transcriptional activator of the 
basic leucine zipper (BZIPj family, 
works with MET4P and CBF1P to 
regulation sulfur amino acid 
metabolism 

Protein of unknown function 
Allantoin permease, member of the 
uracil/allantoin permease family of 
the major facilitator superfamily 
(MFS) 

Ubiquitin-specific protease 
(ubiquitin C-terminal hydrolase), 
cleaves at the C-terminus of 
ubiquitin 

Protein of unknown function 

Zinc-finger protein involved in 
transcriptional regulation of 
methionine metabolism 
Fatty acid elongation protein 
involved in elongation of 
tetradecanoic acid to hexadecanoie 
acid 

Protein of unknown function, 

member of the major facilitator 

superfamily (MFS) 

Amino acid permease for valine, 

leucine, isoleucine, tyrosine, and 

tryptophan 

Protein of unknown function 
Protein of unknown function 
Protein of unknown function 
Protein of unknown function 
Protein of unknown function 

Translation initiation factor EIF3, 
P39 subunit, has 2 WD (WD-40) 
repeats 

Protein of unknown function 

Protein with similarity to 
UDPglucose 4-epimerase 
Protein with similarity to several 
pre-MRNA splicing factors 
Protein of unknown function 
Proline oxidase, first step in 
synthesis of glutamate from proline 
Protein of unknown function 
Aromatic amino acid 
aminotransferase II 
Protein of unknow n function 
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Seq. 






v J -'♦III / 








Num. 


Clone ID 


ALIAS 


LP-4hr 


K-50/CK 


K-100/CK 


Gene Description 


005 


VIL100W 




9.97598883 


0 


2.8122773 


Protein of unknown function, 
questionable ORF 


<)06 


YPL025C 




9.52247441 


0 


20.22382 


Protein of unknown function 


007 


VOR072W 




7 48662389 


0 


6.2287404 


Protein of unknown function 


008 


YNL242W 




6 47720448 


0 


2.5753253 


Protein of unknown function 


009 


VIR027C 


(DAL1) 


5 64113227 


0 


0 


Allantoinase, first step in the 

HpornH ntinn of ;illnntoin ;i 

UV^l uUUUUll VV 1 tlllCAlllV/111 UJ H 

secondarv mtro&en source 




YOR ! 

I W IX 1 Vv 




~> '-t\J KJ'-r Ol J- 




1 1 760995 

11./ UU s y w> 


Tnirmcrmtion factor with domains 
hnmolntrniK to \4YO oncoprotein 
and yeast HSF1P, required for 
normal cell surface assembly and 
floeculence 


Oil 


YEL019C 


(MMS21) 


3.34008483 


0 


2.069236 


Protein involved in DNA repair 


012 


YDL132W 


(cdc53) 


3.16426832 


0 


0.1467847 




0 1 ^ 


YOR 1 77P 




2.97842594 


o 


0 435871 


Protein ot unknown function 


614 


YML042W 


(CAT2) 


2.76437696 


0 


16.65885 


Carnitine O-aeetyltransferase, 
peroxisomal and mitochondrial 


0 1 s 

U1J 




(MFT41 


2.5971776 


o 


o 


Protein renuired earlv in meiosis 
for meiotic recombination, 
chromosome synapsis, and viable 

cnnrp fnrm.'itmn 


0 1 0 


I UKUOjL 


/f T f DTI 




n 
u 


VJ 


Tr*in«l:»tif^n initiation tiiftnr FTF^R 

1 1 ill l^ilclllLfl 1 1 111 LluLll Jl 1 lilLUJl 1-11 — 1 J 

^ m i i ni n*=* ni l^oti/Hf 1 PYrh'inop 
^UuIlllIC 1IUL1CUUUC CALMull^t 

factor), 71 KDA (delta) subunit 


017 


YAR030C 




2.06301879 


0 


0 


Protein of unknown function, 
probable non-coding ORF 


018 


YJR157W 




0.2073771 


0 


0.6711879 


Protein of unknown function 


019 


YHR217C 




0.2061042 


0 


0.549346 


Protein of unknown function 


020 


YKL100C 




0.12715731 


0 


40.399169 


Protein of unknown function 



* Table Headings: 

Clone ID : A clone ID designation number. 

Alias : Alternative gene names used in the literature. This information is provided 
5 by YPD™, Hodges et al. Nucl. Acids Res. 27: 69-73 (1999), the entirety of 

which is herein incorporated by reference. 

CJ-4hr/LP-4hr : Expression level in the mutant CJ517 as compared with the 
respective wild type strain LPY9 at 4hr sampling of log phase growth of yeast 
(ratio of mutant expression level/control expression level). CJ refers to the 
10 mutant CJ517 (The mutant is defective in the gene (ERG1 1) codes for C 14 

demethyiase enzyme in the sterol biosynthetic pathway). LP refers to the 
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10 



15 



respective wild type strain LPY9, used to compare the gene expression profile 
with the mutant. 

K-50/CK : Expression level in the wild type yeast LPY9, at 2 hr after treatment 
with SOmicro gram/ml ketoconazole as compared to the wild type LPY9 strain 
without ketoconazole treatment (ratio of treatment expression level/control 
expression level). K refers to ketoconazole treatment. The clones listed in Table 
2 are either up or dow n regulated in the mutant (CJ517) as well as in response to 
ketoconazole treatment. 

K-10Q/CK : Expression level in the wild type yeast LPY9, at 2 hr after treatment 
with lOOmicro gram/ml ketoconazole as compared to the wild type LPY9 strain 
without ketoconazole treatment (ratio of treatment expression level/control 
expression level). 

Gene Description : Description of the clone listed in column 1. 
Table 3, below, lists the RNAs from Table 2 that correspond to genes or structural 
regions implicated in transcription regulation. 

Table 3* 



Seq. 

Num. Clone ID ALIAS 

30 YOR237W (HES1) 



42 



74 

75 



YDR213W 



YGR177C 
YFR034C 



(ATF2) 
(PH04) 



CJ-4hr/ 

LP-4hr K-50/CK K-100/CK Gene Description 

134.648161 1417.62621 1358.12348 Protein implicated in ergosterol 

biosynthesis, member of the 
KES 1/HES 1/OSH 1/YKR003W 
family of oxysterol-binding 
(OSBP) proteins 

18.2079478 32.1360646 58.35861 16 Protein with similarity to 

transcription factors, has ZN[2]- 
CYS[6] fungal-type binuclear 
cluster domain in the N-terminal 
region 

3.7081426 11.830167 12.5552685 Alcohol O-acetyltransferase 
14.81 120S3 1 1.2160731 20.8445145 Basic helix-loop-helix (BHLH) 

transcription factor required for 
expression of phosphate pathway, 
hyperphosphorylation by PHO80P- 
PHOS5P cychn-dependent protein 
kinase complex causes inactivation 
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Seq. CJ-4hr/ 

Num. Clone ID ALIAS LP-4hr K-50/CK 

83 YOL067C (RTG1) 30.4142081 10.0270648 



131 

132 
157 



K-100/CK 

27.3663295 



100 YJL127C (SPT10) 4.01528284 7 83944269 10.0960266 



111 YMR037C (MSN2) 6.80686734 6 42359685 7.66129891 



YCR048W (ARE 1 ) 

YAL013W (DEP1) 

YIL084C (SDS3) 

YKR034W (DAL80) 



9.11370518 

8.79366086 

1.99582364 
3.91750209 



6 1039374 

5 54633863 

5 54306878 
5 0436172 



10.5312906 

6.42500999 

6.90742248 
7.28385659 



172 YLR098C (CHA4) 2.05280928 4.75643469 5.58664651 



180 YDR389W (SAC7) 

202 YDL088C (ASM4) 

206 YBL005W (PDR3) 

242 YGL071W (RCS1) 

255 YAR044W (OSH1) 



3.89197011 
4.39685251 



4.56095992 
4 17572645 



4.31431086 
3.32103404 



3.75060207 4.14490535 6.08273054 



3.39203358 
4.12112011 



3 69630773 
3 624939 



4.53101664 

3.88396219 



256 YLR120C (YAP3) 



6.14265883 3.62298451 4.42985615 



260 YJR017C (ESS1) 



2.98118086 3.55874146 3.22082555 



Gene Description 

Basic helix-loop-helix (BHLH) 
transcription factor involved in 
inter-organelle communication 
between mitochondria, 
peroxisomes, and nucleus 
Protein that amplifies the 
magnitude of transcriptional 
regulation at various loci 
Zinc-finger transcriptional activator 
for genes involved in the 
multistress response and genes 
regulated through SNF1P 
Ac y I - CO A : s tero 1 ac y 1 tr ans fe r a se 
(sterol-ester synthetase) 
Regulator of phospholipid 
metabolism 

Suppressor of silencing defect 
GATA-type zinc finger 
transcriptional repressor for 
allantoin and 4-aminobutyric acid 
(GAB A) catabolic genes 
Zinc -finger protein required for 
activation of CHA1, has A ZN[2]- 
CYS[6] fungal-type binuclear 
cluster domain 

GTPase-activating protein for 
RHOIP 

Suppressor of temperature- 
sensitive mutations in POL3P 
(DNA polymerase delta) 
Transcription factor related to 
PDR1P, contains A ZN[2]-CYS[6] 
fungal-type binuclear cluster 
domain in the N-terminal region 
Regulatory protein involved in iron 
uptake 

Protein implicated in ergosterol 
biosynthesis, member of the 
KES 1/HES 1/OSH 1/YKR003W 
family of oxysterol-binding 
(OSBP) proteins 
Transcription factor of the basic 
leucine zipper (BZIP) family, one 
of eight members of a novel 
fungal-specific family of BZIP 
proteins 

Processing/termination factor, 
involved in transcription 
termination or 3'-end processing of 
pre-MRNA 
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Seq. 
Num. 

271 



278 
289 

290 
296 



297 



301 



322 



331 



336 



345 
346 



349 
359 
367 



Clone ID 

YMR019W 



YLR153C 
YPL119C 

YGL014W 



YLR249W 



YOR337W 



YJR019C 
YBL008W 



YGL001C 
YCL037C 
YDR0S8C 



ALIAS 

(STB4) 



(ACS2) 
(DBP1) 



CJ-4hr/ 
LP-4hr 

3.2576922 



3.45528019 
5.87199247 



K-50/CK 

3.44146214 



3.37854457 
3.24642228 



K-100/CK 

3 39764598 



3 12858117 
2 19736599 



YGL192W (IME4) 



3.11296478 3.22942947 3 6382821 



2 . 8903095 3 3.1 703 8103 6 47 84 1 05 3 



YMR047C (NUP116) 2.56622055 3.17022339 4 77420515 



(YEF3) 
(TEA1) 



YCR084C (TUP1) 



YPL226W 



3.59397167 3.14453335 2.63195398 



2.13152473 2.97907151 4 82285812 



2.40138822 2.92198431 3 27182635 



2.45263084 2.88856775 2.55579443 



(TES1) 
(HIR1) 



4.07777555 
7.24580603 



2.83032346 
2.82847131 



2 07248965 
2 88668127 



(SR09) 
(SLU7) 



3.91981575 
8.35007693 
2.07293165 



2.82148161 
2.78557477 
2.73396273 



1 98527852 

2 39358801 
2.68767436 



Gene Description 

SIN3P-binding protein, has ZN[2]- 
CYS[6] fungal-type binuclear 
cluster domain in the N-terminal 
region 

Acetyl-COA synthetase (aeetate- 
COA ligase) 

ATP-dependent RNA helicase of 
dead box family, suppressor of 
SPP8 1/DED1 

Protein with pumilio repeats that is 
involved with MPT5P in 
relocalization of SIR3P and SIR4P 
from telomeres to the nucleolus 
Positive transcription factor for 
IME1 and IME2, mediates control 
of meiosis by carrying signals 
regarding mating type (A/alpha) 
and nutritional status 
Nuclear pore protein (nucleoporin) 
of the GLFG family, may be 
involved in binding and 
translocation of nuclear proteins 
Translation elongation factor EF- 
3 A, member of ATP-binding 
cassette (ABC) superfamily 
TY1 enhancer activator of the 
GAL4P-type family of DNA- 
binding proteins 

General repressor of transcription 
(with SSN6P), member of WD 
(WD-40) repeat family 
Protein with similarity to members 
of the ATP-binding cassette (ABC) 
superfamily 
Acyl-COA thioesterase 
Histone transcription inhibitor, 
required for periodic repression of 
3 of the 4 histone gene loci and for 
autogenous repression of HTA1- 
HTB1 locus by H2A and H2B 
Protein with similarity to nocardia 
SP. cholesterol dehydrogenase 
Suppressor of YPT6 null and 
RH03 mutations 
Pre-MRNA splicing factor 
affecting 3' splice site choice, 
required only for the second 
catalytic step of splicing 
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# 



Seq. 

Num. Clone ID 

376 YBR289W 



400 VPL237W 
406 YOR127W 
416 YFR009W 



417 

440 
441 

442 

492 
496 

503 
504 
515 

526 



549 



YGR270W 
YBR119W 



YAL04SC 

YHL027W 
YGL1 12C 

YPL02SW 



ALIAS 

(SNF5) 



CJ-4hr/ 
LP-4hr 

2 00671327 



K-50/CK 

2.68812945 



K-100/CK 

3 03861899 



(SUI3) 

(RGA1) 

(GCN20) 



2 5966981 

3 85804733 
2.637821 18 



2.5628077 

2.53166489 

2.48595438 



2 54794558 
2 56973414 
2 16133777 



YDR211W (GCD6) 



2.22567451 2.48354852 1 82406386 



(YTA7) 
(MUD1) 



2 68803581 
2 83912216 



2.4056715 
2.40515252 



1 94575504 
1 39876418 



YDR052C (DBF4) 



YPR080W (TEF1) 



6 85835185 2.40369283 1 58349756 



~> 50402057 2.21150946 1 85878786 



YDR264C (AKR1) 3 13151279 2.19839665 2.75165355 



YDR422C (SIP1) 



2 62373247 2.18707608 2 08363472 



YMROSOC (NAM7) 2 82340116 2.1828046 2 17148277 



5.02313141 2.13847476 4 22113197 



(RIM101) 

(TAF60) 

(FRG10) 



2 57210755 
2 21463331 



2.10331571 
2.07652653 



2.59278915 
2.18913076 



2 86559 13S 2.01859514 1.69893374 



Gene Description 

Component of SWI/SNF global 
transcription activator complex, 
acts to assist gene-specific 
activators through chromatin 
remodeling 

Translation initiation factor 
EIF2beta subunit 
RHO-type GTPase- activating 
protein (GAP) for CDC42P 
Component of a protein complex 
required for activation of GCN2P 
protein kinase in response to amino 
acid starvation, member of ATP- 
binding cassette (ABC) 
superfamily 

Translation initiation factor EIF2B 
(guanine nucleotide exchange 
factor), 81 KDA (beta) subunit 
Protein with similarity to members 
of the AAA family of AT Pases 
Ul SNRNP A protein (SNRNA- 
associated protein) with 2 RNA 
recognition (RRM) domains 
Regulatory subunit for CDC7P 
protein kinase, required for Gl/S 
transition 

Translation elongation factor EF- 

1 alpha (TEF1 and TEF2 code for 

identical proteins) 

Ankyrin repeat-containing protein 

that has an inhibitory effect on 

signaling in the pheromone 

pathway 

Multicopy suppressor of SNF1, 
related to GAL83P/SPM1P and 
SPM2P 

Protein involved with NMD2P and 
UPF3P in decay of MRNA 
containing nonsense codons 
Protein with weak similarity to 
RAS1P, RAS2P, and other GTP- 
binding proteins of the RAS 
superfamily 

Zinc-finger protein involved in 

induction of I ME 1 

Component of TAF(II) complex 

(TBP-assoeiated protein complex) 

required for activated transcription 

by RNA polymerase II 

Ac etyl- C O A acet y 1 1 ra ns ferase 

(acetoacet) l-COA thiolase). first 

step in mevalonate/sterol pathway 
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Seq. 

Num. Clone ID ALIAS 

621 YNR019W (ARE2) 

560 YFL054C 



562 YAL001C (TFC3) 



CJ-4hr/ 
LP-4hr 

2.1 



7.43223753 0.47935821 0.61365816 



2.94740587 0.47427458 1.19155655 



579 YIR017C (MET28) 2.97658904 0.3775372 0.30088953 



585 YDR253C (MET32) 2.86314943 0.33971751 0.32790428 



595 YMR146C (TIF34) 



616 YGR083C (GCD2) 



610 YOR139C (SFL1) 



5.46648132 0 



K-50/CK K-100/CK Gene Description 

1.79103463 2.85442 Acyl-COA:sterol acyltransferase 

(sterol-ester synthetase) 
Protein with similarity to FPS1P 
and YPR192P, member of MIP 
family of transmembrane channels 
RNA polymerase transcription 
initiation factor TFIIIC <TAU), 138 
KDA subunit 

Transcriptional activator of the 
basic leucine zipper (BZIP) family, 
works with MET4P and CBF1P to 
regulation sulfur amino acid 
metabolism 

Zinc-finger protein involved in 
transcriptional regulation of 
methionine metabolism 
Translation initiation factor EIF3, 
P39 subunit, has 2 WD (WD-40) 
repeats 

) Translation initiation factor EIF2B 

(guanine nucleotide exchange 
factor), 71 KDA (delta) subunit 
. 1 .760995 1 Transcription factor with domains 
homologous to MYC oncoprotein 
and yeast HSF1P, required for 
normal cell surface assembly and 
flocculence 



5.0351989 0.22482039 0.71935381 



2.32134339 0 



* Table Headings: 

Clone ID : A clone ID designation number. 

CJ-4hr/LP-4hr : Expression level in the mutant CJ517 as compared with the 
5 respective wild type strain LPY9 at 4hr sampling of log phase growth of yeast 

(ratio of mutant expression level/control expression level). Genes in the Table are 
either up or down regulated in the mutant (CJ517) as well as in response to 
ketoconazole treatment. 

K-5Q/CK : Expression level in the wild type yeast LPY9, at 2 hr after treatment 
10 with 50micro gram/ml ketoconazole as compared to the wild type LPY9 strain 

without ketoconazole treatment (ratio of treatment expression level/control 
expression level). 
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K-100/CK : Expression level in the wild type yeast LPY9, at 2 hr after treatment 
with lOOmicro gram/ml ketoconazole as compared to the wild type LPY9 strain 
without ketoconazole treatment (ratio of treatment expression level/control 
expression level). 

5 Gene Description : Description of the clone listed in column 1. 

In addition, for example, Table 2 identifies a yeast HES1 gene as a gene with an 
associated change in mRNA levels in the two different comparisons. Fang et ai EMBO J 
15:6447-59 (1996), the entirety of which is herein incoiporated by reference, reported a mutation 
in HESL which caused a 55% reduction in carbon flux through the mevalonate pathway in yeast. 
10 Each of the sequences listed in Table 2 or 3 represents a gene that effects sterol levels, 

directly or indirectly, or whose expression changes as a result of alterations in the sterol synthesis 
pathway. 

Example 2 

Sequences that encode for the yeast HES1 protein are used to search databases for 
15 homologues from other species. A number of different databases can be used for these searches, 
including, for example, dbEST, GenBank, EMBL, SwissProt, PIR, and GENES. In addition, 
various algorithms for searching can be selected, such as, for example, the BLAST suite of 
programs at the default values. Typically, matches found with BLAST P values equal or less 
than 0.001 (probability) or BLAST Score of equal or greater than 90 are classified as hits. If the 
20 program used to determine the hit is HMMSW then the score refers to HMMSW score. The 
GenBank database is searched with BLASTN and BLASTX (default values). Sequences that 
pass the hit probability threshold of 10e" H are considered hits. 

Table 4 

Seq. Sequence: 

Num. Clone ID DNA/Protein Hit description Library 

1 701100307CPR9855 DNA Yeast HES 1 homolog SOYMON028 

2 701001443CPR9S57 DNA Yeast HES 1 homolog SOYMON018 

3 701010572CPR9854 DNA Yeast HES 1 homolog SOYMON019 

4 701 176735CPR9736 DNA Yeast HES 1 homolog SATMONN05 




Seq. 
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Arabidopsis HES 1 homolog 
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Arabidopsis HES 1 homolog 





Homologues to yeast HES1 are also identified in the following libranes: SOYMON003, 
SOYMON005, SOYMON006, SOYMON009, SOYMON018, SOYMON019, SOYMON020, 



SOYMON022, SOYMON028, SOYMON023, SOYMON032, SOYMON034, SOYMON027, 
SATMONN05, LIB22, LIB 24, and LIB 25 These libraries are prepared as follows: 
5 The SATMONN05 cDNA library is a normalized library generated from maize (B73 x 

Mol7, Illinois Foundation Seeds, Champaign Illinois, U.S.A.) root tissue at the V6 development 
stage. Seeds are planted at a depth of approximately 3 cm into 2-3 inch peat pots containing 
Metro 200 growing medium. After 2-3 weeks growth they are transplanted into 10 inch pots 
containing the same growing medium. Plants are watered daily before transplantation and three 
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times a week after transplantation. Peters 15-16-17 fertilizer is applied three times per week after 
transplanting at a strength of 150 ppm N. Two to three times during the lifetime of the plant, 
from transplanting to flowering, a total of 900 mg Fe is added to each pot. Maize plants are 
grown in a greenhouse in 15 hr day/9 hr night cycles. The daytime temperature is approximately 
5 80°F and the nighttime temperature is approximately 70°F. Supplemental lighting is provided by 
1000 W sodium vapor lamps. Tissue is collected when the maize plant is at the 6-leaf 
development stage. The root system is cut from the mature maize plant and washed with water 
to free it from the soil. The tissue is immediately frozen in liquid nitrogen and the harvested 
tissue is then stored at -80°C until RNA preparation. The RNA is purified from the stored 

10 tissue. The library is normalized in two rounds using conditions adapted from Soares et aL, 
Proc. Natl. Acad. ScL (U.S.A.) 97:9928 (1994), the entirety of which is herein incorporated by 
reference and Bonaldo et al„ Genome Res. 6: 791 (1996), the entirety of which is herein 
incorporated by reference except that a significantly longer (48 -hours/round) reannealing 
hybridization was used. SATMON003 is a root tissue library from the same donor. 

15 The SOYMON003 cDNA library is generated from soybean cultivar Asgrow 3244 

(Asgrow Seed Company, Des Moines, Iowa U.S.A.) hypocotyl axis tissue from seedlings 2 day 
after-imbibition. Seeds are planted at a depth of approximately 2cm into 2-3 inch peat pots 
containing Metromix 350 medium. Trays are placed in an environmental chamber and grown at 
12hr daytime/1 2hr nighttime cycles. The daytime temperature is approximately 29°C and the 

20 nighttime temperature approximately 24°C. Soil is checked and watered daily to maintain even 
moisture conditions. Tissue is collected 2 days after the start of imbibition. The 2 days after 
imbibition samples are separated into 3 collections after removal of any adhering seed coat. At 2 
days after imbibition under the above conditions, the seedlings have significant expansion of the 
axis and are close to emerging from the soil. A few seedlings have cracked the soil surface and 

25 exhibited slight greening of the exposed cotyledons. The seedlings are washed in water to 

remove soil, hypocotyl axis harvested and immediately frozen in liquid nitrogen. The harvested 
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tissue is then stored at -80°C until RNA preparation. The RNA is purified from the stored tissue 
and the cDNA library is constructed. 

The stored RNA is purified using Trizol reagent from Life Technologies (Gibco BRL, 
Life Technologies, Gaithersburg, Maryland U.S.A.), essentially as recommended by the 
5 manufacturer. Poly A+ RNA (mRNA) is purified using magnetic oligo dT beads essentially as 
recommended by the manufacturer (Dynabeads, Dynal Corporation, Lake Success, New York 
U.S.A.). 

Construction of plant cDNA libraries is well-known in the art and a number of cloning 
strategies exist. A number of cDNA library construction kits are commercially available. The 
10 Superscript™ Plasmid System for cDNA synthesis and Plasmid Cloning (Gibco BRL, Life 

Technologies, Gaithersburg, Maryland U.S.A.) is used, following the conditions suggested by the 
manufacturer. 

The SOYMON005 cDNA library is generated from soybean cultivar Asgrow 3244 
(Asgrow Seed Company, Des Moines, Iowa U.S.A.) hypocotyl axis tissue from seeds 6 hour 

15 post-imbibition. Seeds are planted at a depth of approximately 2cm into 2-3 inch peat pots 

containing Metromix 350 medium. Trays are placed in an environmental chamber and grown at 
12hr daytime/1 2hr nighttime cycles. The daytime temperature is approximately 29°C and the 
nighttime temperature approximately 24°C. Soil is checked and watered daily to maintain even 
moisture conditions. Tissue is collected 6 hours after the start of imbibition. The 6 hours after 

20 imbibition sample is collected over the course of approximately 2 hours starting at 6 hours post 
imbibition. At the 6 hours after imbibition stage, not all cotyledons have become fully hydrated 
and germination. Radicle protrusion has not occurred. The seedlings are washed in water to 
remove soil, then the hypocotyl axis is harvested and immediately frozen in liquid nitrogen. The 
harvested tissue is then stored at -80°C until RNA preparation. The RNA is purified from the 

25 stored tissue and the cDNA library is constructed. 

The stored RNA is purified uMng Trizol reagent from Life Technologies (Gibco BRL, 
Life Technologies, Gaithersburg, Ma viand U.S.A.), essentially as recommended by the 
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manufacturer. Poly A+ RNA (mRNA) is purified using magnetic oligo dT beads essentially as 
recommended by the manufacturer (Dynabeads, Dynal Corporation, Lake Success, New York 
U.S.A.). 

Construction of plant cDNA libraries is well-known in the art and a number of cloning 
5 strategies exist. A number of cDNA library construction kits are commercially available. The 
Superscript™ Plasmid System for cDNA synthesis and Plasmid Cloning (Gibco BRL, Life 
Technologies, Gaithersburg, Maryland U.S.A.) is used, following the conditions suggested by the 
manufacturer. 

The SOYMON006 cDNA library is generated from soybean cultivar Asgrow 3244 

10 (Asgrow Seed Company, Des Moines, Iowa U.S.A.) cotyledons from seeds 6 hour post- 
imbibition. Seeds are planted at a depth of approximately 2cm into 2-3 inch peat pots containing 
Metromix 350 medium. Trays are placed in an environmental chamber and grown at 12hr 
daytime/12hr nighttime cycles. The daytime temperature is approximately 29°C and the 
nighttime temperature approximately 24°C. Soil is checked and watered daily to maintain even 

15 moisture conditions. Tissue is collected 6 hours after imbibition. The 6 hours after imbibition 
sample is collected over the course of approximately 2 hours starting at 6 hours post-imbibition. 
At the 6 hours after imbibition, not all cotyledons have become fully hydrated and germination. 
Radicle protrusionhas not occurred. The seedlings are washed in water to remove soil, then the 
cotyledon is harvested and immediately frozen in liquid nitrogen. The harvested tissue is then 

20 stored at -80°C until RNA preparation. The RNA is purified from the stored tissue and the 
cDNA library is constructed. 

The stored RNA is purified using Trizol reagent from Life Technologies (Gibco BRL, 
Life Technologies, Gaithersburg, Maryland U.S.A.), essentially as recommended by the 
manufacturer. Poly A+ RNA (mRNA) is purified using magnetic oligo dT beads essentially as 

25 recommended by the manufacturer (Dynabeads, Dynal Corporation, Lake Success, New York 
U.S.A.). 



- 145 - 



# 



Construction of plant cDNA libraries is well-known in the art and a number of cloning 
strategies exist. A number of cDNA library construction kits are commercially available. The 
Superscript™ Plasmid System for cDNA synthesis and Plasmid Cloning (Gibco BRL, Life 
Technologies, Gaithersburg, Maryland U.S.A.) is used, following the conditions suggested by the 
5 manufacturer. 

The SOYMON009 cDNA library is generated from soybean cutlivar C1944 (USDA 
Soybean Germplasm Collection, Urbana, Illinois U.S.A.) pod and seed tissue harvested 15 days 
post-flowering. Pods from field grown plants are harvested 15 days post-flowering. The pods 
are picked from all over the plant, placed into 14ml polystyrene tubes and immediately immersed 

10 in dry-ice. Approximately 3g of pod tissue is harvested. The harvested tissue is then stored at - 
80°C until RNA preparation. The RNA is purified from the stored tissue and the cDNA library is 
constructed. The RNA is purified from the stored tissue and the cDNA library is constructed. 

The stored RNA is purified using Trizol reagent from Life Technologies (Gibco BRL, 
Life Technologies, Gaithersburg, Maryland U.S.A.), essentially as recommended by the 

15 manufacturer. Poly A+ RNA (mRNA) is purified using magnetic oligo dT beads essentially as 
recommended by the manufacturer (Dynabeads, Dynal Corporation, Lake Success, New York 
U.S.A.). 

Construction of plant cDNA libraries is well-known in the art and a number of cloning 
strategies exist. A number of cDNA library construction kits are commercially available. The 
20 Superscript™ Plasmid System for cDNA synthesis and Plasmid Cloning (Gibco BRL, Life 

Technologies, Gaithersburg, Maryland U.S.A.) is used, following the conditions suggested by the 
manufacturer. 

The SOYMON018 cDNA is generated from soybean cultivar Asgrow 3244 (Asgrow 
Seed Company, Des Moines, Iowa U.S.A.) leaf tissue harvested from plants grown in a field in 
25 Jerseyville 45 and 55 days after flowering. Leaves from field grown plants are harvested 45 and 
55 days after flowering from the fourth node. Approximately 27g and 33g of leaves are collected 
from the 45 and 55 days after flowering plants, placed into 14ml polystyrene tubes and 
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immediately immersed in dry ice. The harvested tissue is then stored at -80°C until RNA 
preparation. Total RNA is prepared from the combination of equal amounts of leaf tissue from 
both time points and the cDNA library is constructed. 

The stored RNA is purified using Trizol reagent from Life Technologies (Gibco BRL, 
5 Life Technologies, Gaithersburg, Maryland U.S.A.), essentially as recommended by the 

manufacturer. Poly A+ RNA (mRNA) is purified using magnetic oligo dT beads essentially as 
recommended by the manufacturer (Dynabeads, Dynal Corporation, Lake Success, New York 
U.S.A.). 

Construction of plant cDNA libraries is well-known in the art and a number of cloning 
10 strategies exist. A number of cDNA library construction kits are commercially available. The 
Superscript™ Plasmid System for cDNA synthesis and Plasmid Cloning (Gibco BRL, Life 
Technologies, Gaithersburg, Maryland U.S.A.) is used, following the conditions suggested by the 
manufacturer. 

The SOYMON019 cDNA library is generated from soybean cultivars Cristalina (USDA 
15 Soybean Germplasm Collection, Urbana, Illinois U.S.A.) and FT108 (Monsoy, Brazil) (tropical 
germ plasma) root tissue. Roots are harvested from plants grown in an environmental chamber 
under 12hr daytime/ 12hr nighttime cycles. The daytime temperature is approximately 29°C and 
the nighttime temperature approximately 24°C. Soil is checked and watered daily to maintain 
even moisture conditions. Approximately 50g and 56g of roots are harvested from each of the 
20 Cristalina and FT108 cultivars and immediately frozen in dry ice. The plants are uprooted and 
the roots quickly rinsed in a pail of water. The root tissue is then cut from the plants, placed 
immediately in 14ml polystyrene tubes and immersed in dry-ice. The harvested tissue is then 
stored at -80°C until RNA preparation. Total RNA is prepared from the combination of equal 
amounts of root tissue from each cultivar and the cDNA library is constructed. 
25 The stored RNA is purified using Trizol reagent from Life Technologies (Gibco BRL, 

Life Technologies, Gaithersburg, Maryland U.S.A.), essentially as recommended by the 
manufacturer. Poly A+ RNA (mRNA) is purified using magnetic oligo dT beads essentially as 
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recommended by the manufacturer (Dynabeads, Dynal Corporation, Lake Success, New York 
U.S.A.), 

Construction of plant cDNA libraries is well-known in the art and a number of cloning 
strategies exist. A number of cDNA library construction kits are commercially available. The 
5 Superscript™ Plasmid System for cDNA synthesis and Plasmid Cloning (Gibco BRL, Life 

Technologies, Gaithersburg, Maryland U.S.A.) is used, following the conditions suggested by the 
manufacturer. 

The SOYMON020 cDNA is generated from soybean cultivar Asgrow 3244 (Asgrow 
Seed Company, Des Moines, Iowa U.S.A.) seeds harvested from plants grown in a field in 

10 Jerseyville 65 and 75 days post-flowering. The seed pods are picked from all over the plant and 
the seeds extracted from the pods. Approximately 14g and 3 lg of seeds are harvested from the 
respective seed pods and immediately frozen in dry ice. The harvested tissue is then stored at - 
80°C until RNA preparation. Total RNA is prepared from the combination of equal numbers of 
seeds from 65 and 75 days after flowering and the cDNA library is constructed. 

15 The stored RNA is purified using Trizol reagent from Life Technologies (Gibco BRL, 

Life Technologies, Gaithersburg, Maryland U.S.A.), essentially as recommended by the 
manufacturer. Poly A+ RNA (mRNA) is purified using magnetic oligo dT beads essentially as 
recommended by the manufacturer (Dynabeads, Dynal Corporation, Lake Success, New York 
U.S.A.). 

20 Construction of plant cDNA libraries is well-known in the art and a number of cloning 

strategies exist. A number of cDNA library construction kits are commercially available. The 
Superscript™ Plasmid System for cDNA synthesis and Plasmid Cloning (Gibco BRL, Life 
Technologies, Gaithersburg, Maryland U.S.A.) is used, following the conditions suggested by the 
manufacturer. 

25 The SOYMON022 cDNA library is generated from soybean cultivar Asgrow 3244 

(Asgrow Seed Company, Des Moines, Iowa U.S.A.) partially to fully opened flower tissue, 
which is harvested from plants grown in an environmental chamber. Seeds are planted in moist 
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Metromix 350 medium at a depth of approximately 2cm. Trays are placed in an environmental 
chamber set to a 12h day/12h night cycle, 29°C daytime temperature, 24°C night temperature and 
70% relative humidity. Daytime light levels are measured at 450|iEinsteins/m 2 . Soil is checked 
and watered daily to maintain even moisture conditions. Flowers are removed from the plant at 
5 the pedicel. Flower buds showing petal color to fully open flowers are selected for collection. A 
total of 3g of flower tissue is harvested and immediately frozen in dry ice. The harvested tissue 
is then stored at -80°C until RNA preparation. Total RNA is prepared from a mixture of opened 
and partially opened flowers and the cDNA library is constructed. 

The stored RNA is purified using Trizol reagent from Life Technologies (Gibco BRL, 
10 Life Technologies, Gaithersburg, Maryland U.S.A.), essentially as recommended by the 

manufacturer. Poly A+ RNA (mRNA) is purified using magnetic oligo dT beads essentially as 
recommended by the manufacturer (Dynabeads, Dynal Corporation, Lake Success, New York 
U.S.A.). 

Construction of plant cDNA libraries is well-known in the art and a number of cloning 
15 strategies exist. A number of cDNA library construction kits are commercially available. The 
Superscript™ Plasmid System for cDNA synthesis and Plasmid Cloning (Gibco BRL, Life 
Technologies, Gaithersburg, Maryland U.S.A.) is used, following the conditions suggested by the 
manufacturer. 

The SOYMON023 cDNA library is generated from soybean genotype BW21 IS Null 

20 (Tohoku University, Morioka, Japan) seed tissue harvested from plants grown in a field in 

Jersey ville. After 15 and 40 days, pods are harvested from all over the plant and seeds are 

dissected out from the pods. Approximately, 0.7g and 14. 2g of seeds are harvested from the 

plants at the 15 and 40 days after flowering timepoints. The seeds are placed into 14ml 

polystyrene tubes and immersed in dry-ice. The tissue is then transferred to a -80°C freezer for 

25 storage. The harvested tissue is then stored at -80°C until RNA preparation. Total RNA is 

prepared from the combination of 0.5g and l.Og of seeds from the 15 and 40 days after flowering 

timepoints and the cDNA library is constr cted. 
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The stored RNA is purified using Trizol reagent from Life Technologies (Gibco BRL, 
Life Technologies, Gaithersburg, Maryland U.S.A.), essentially as recommended by the 
manufacturer. Poly A+ RNA (mRNA) is purified using magnetic oligo dT beads essentially as 
recommended by the manufacturer (Dynabeads, Dynal Corporation, Lake Success, New York 
5 U.S.A.). 

Construction of plant cDNA libraries is well-known in the art and a number of cloning 
strategies exist. A number of cDNA library construction kits are commercially available. The 
Superscript™ Plasmid System for cDNA synthesis and Plasmid Cloning (Gibco BRL, Life 
Technologies, Gaithersburg, Maryland U.S.A.) is used, following the conditions suggested by the 
10 manufacturer. 

The SOYMON028 cDNA library is generated from soybean cultivar Asgrow 3244 
(Asgrow Seed Company, Des Moines, Iowa U.S.A.) drought-stressed root tissue. Seeds are 
planted in moist Metromix 350 medium at a depth of approximately 2cm in trays. The trays are 
placed in an environmental chamber set to a 12h day/12h night cycle, 26°C daytime temperature, 

15 21°C night temperature and 70% relative humidity. Daytime light levels are measured at 

300)AEinsteins/rrf . Soil is checked and watered daily to maintain even moisture conditions. At 
the R3 stage of development, water is withheld from half of the plant collection (drought stressed 
population). After 3 days, half of the plants from the drought stressed condition and half of the 
plants from the control population are harvested. After another 3 days (6 days post drought 

20 induction) the remaining plants are harvested. A total of 27g and 40g of root tissue is harvested 
from plants at two time points and immediately frozen in dry ice. The harvested tissue is then 
stored at -80°C until RNA preparation. Total RNA is prepared from the combination of equal 
amounts of drought stressed root tissue from both time points and the cDNA library is 
constructed. 

25 The stored RNA is purified using Trizol reagent from Life Technologies (Gibco BRL, 

Life Technologies, Gaithersburg, Maryland U.S.A.), essentially as recommended by the 
manufacturer. Poly A+ RNA (mRNA) is purified using magnetic oligo dT beads essentially as 
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recommended by the manufacturer (Dynabeads, Dynal Corporation, Lake Success, New York 
U.S.A.). 

Construction of plant cDNA libraries is well-known in the art and a number of cloning 
strategies exist. A number of cDNA library construction kits are commercially available. The 
5 Superscript™ Plasmid System for cDNA synthesis and Plasmid Cloning (Gibco BRL, Life 

Technologies, Gaithersburg, Maryland U.S.A.) is used, following the conditions suggested by the 
manufacturer. 

The SOYMON032 cDNA library is prepared from the Asgrow cultivar A4922 (Asgrow 
Seed Company, Des Moines, Iowa U.S.A.) rehydrated dry soybean seed meristem tissue. Surface 

10 sterilized seeds are germinated in liquid media for 24 hours. The seed axis is then excised from 
the barely germinating seed, placed on tissue culture media and incubated overnight at 20°C in 
the dark. The supportive tissue is removed from the explant prior to harvest. Approximately 
570mg of tissue is harvested and frozen in liquid nitrogen. The harvested tissue is then stored at 
-80°C until RNA preparation. The RNA is purified from the stored tissue and the cDNA library 

15 is constructed. 

The stored RNA is purified using Trizol reagent from Life Technologies (Gibco BRL, 
Life Technologies, Gaithersburg, Maryland U.S.A.), essentially as recommended by the 
manufacturer. Poly A+ RNA (mRNA) is purified using magnetic oligo dT beads essentially as 
recommended by the manufacturer (Dynabeads, Dynal Corporation, Lake Success, New York 
20 U.S.A.). 

Construction of plant cDNA libraries is well-known in the art and a number of cloning 
strategies exist. A number of cDNA library construction kits are commercially available. The 
Superscript™ Plasmid System for cDNA synthesis and Plasmid Cloning (Gibco BRL, Life 
Technologies, Gaithersburg, Maryland U.S.A.) is used, following the conditions suggested by the 
25 manufacturer. 

The SOYMON034 cDNA library is generated from soybean cultivar Asgrow 3244 
(Asgrow Seed Company, Des Moines, Iowa U.S.A.) cold-shocked seedling tissue without 
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cotyledons. Seeds are imbibed and germinated in vermiculite for 2 days under constant 
illumination {ca. 510 Lux). After 48 hours, the seedlings are transferred to a cold room set at 5°C 
under constant illumination (ca. 560 Lux). After 30, 60 and 180 minutes seedlings are harvested 
and dissected. The seedlings after 2 days of imbibition are beginning to emerge from the 
5 vermiculite surface. The apical hooks are dark green in appearance. A portion of the seedling 
consisting of the root, hypocotyl and apical hook is frozen in liquid nitrogen and stored at -80°C. 
Total RNA is prepared from equal amounts of pooled tissue and the cDNA library is constructed. 

The stored RNA is purified using Trizol reagent from Life Technologies (Gibco BRL, 
Life Technologies, Gaithersburg, Maryland U.S.A.), essentially as recommended by the 
10 manufacturer. Poly A+ RNA (mRNA) is purified using magnetic oligo dT beads essentially as 
recommended by the manufacturer (Dynabeads, Dynal Corporation, Lake Success, New York 
U.S.A.). 

Construction of plant cDNA libraries is well-known in the art and a number of cloning 
strategies exist. A number of cDNA library construction kits are commercially available. The 
15 Superscript™ Plasmid System for cDNA synthesis and Plasmid Cloning (Gibco BRL, Life 

Technologies, Gaithersburg, Maryland U.S.A.) is used, following the conditions suggested by the 
manufacturer. 

The SOYMON037 cDNA library is generated from soybean cultivar A3244 (Asgrow 
Seed Company, Des Moines, Iowa U.S.A.) etiolated axis and radical tissue. Seeds are planted in 

20 moist vermiculite, wrapped and kept at room temperature in complete darkness until harvest. 
Etiolated axis and hypocotyl tissue is harvested at 2, 3 and 4 days post-planting. Samples are 
frozen in liquid nitrogen upon harvesting and stored at -80°C until RNA preparation. 1 gram of 
each sample (axis + hypocotyl at day 2, 3 and 4) is pooled for RNA isolation. The RNA is 
purified from the pooled tissue and the cDNA library is constructed. 

25 The stored RNA is purified using Trizol reagent from Life Technologies (Gibco BRL, 

Life Technologies, Gaithersburg, Maryland U.S.A.), essentially as recommended by the 
manufacturer. Poly A+ RNA (mRNA) is purified using magnetic oligo dT beads essentially as 



recommended by the manufacturer (Dynabeads, Dynal Corporation, Lake Success, New York 
U.S.A.). 

Construction of plant cDNA libraries is well-known in the art and a number of cloning 
strategies exist. A number of cDNA library construction kits are commercially available. The 
5 Superscript™ Plasmid System for cDNA synthesis and Plasmid Cloning (Gibco BRL, Life 

Technologies, Gaithersburg, Maryland U.S.A.) is used, following the conditions suggested by the 
manufacturer. 

The cDNA library of the present invention designated LIB22, is prepared from 
Arabidopsis thaliana Columbia ecotype root tissue. Wild type Arabidopsis thaliana seeds are 

10 planted in commonly used planting pots and grown in an environmental chamber. After 5-6 
weeks the plants are in the reproductive growth phase. Stems are bolting from the base of the 
plants. After 7 weeks, more stems and floral buds appear, and a few flowers are starting to open. 
Roots of 7-week old plants from pots are rinsed intensively with tap water to wash away dirt, and 
briefly blotted by paper towel to take away free water. The tissues are immediately frozen in 

15 liquid nitrogen and stored at -80'C until total RNA extraction. 

The stored RNA is purified using Trizol reagent from Life Technologies (Gibco BRL, 
Life Technologies, Gaithersburg, Maryland U.S.A.), essentially as recommended by the 
manufacturer. Poly A+ RNA (mRNA) is purified using magnetic oligo dT beads essentially as 
recommended by the manufacturer (Dynabeads, Dynal Corporation, Lake Success, New York 

20 U.S.A.). 

Construction of plant cDNA libraries is well-known in the art and a number of cloning 
strategies exist. A number of cDNA library construction kits are commercially available. The 
Superscript™ Plasmid System for cDNA synthesis and Plasmid Cloning (Gibco BRL, Life 
Technologies, Gaithersburg, Maryland U.S.A.) is used, following the conditions suggested by the 
25 manufacturer. 

The cDNA library of the present invention designated LIB24, is prepared from 
Arabidopsis thaliana, Columbia ecotype, flower bud tissue. Wild type Arabidopsis thaliana 
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seeds are planted in commonly used planting pots and grown in an environmental chamber. 
Flower buds are green and unopened and are harvested about seven weeks after planting. The 
tissue is immediately frozen in liquid nitrogen and stored at -80'C until total RNA extraction. 
The stored RNA is purified using Trizol reagent from Life Technologies (Gibco BRL, 
5 Life Technologies, Gaithersburg, Maryland U.S.A.), essentially as recommended by the 

manufacturer. Poly A+ RNA (mRNA) is purified using magnetic oligo dT beads essentially as 
recommended by the manufacturer (Dynabeads, Dynal Corporation, Lake Success, New York 
U.S.A.). 

Construction of plant cDNA libraries is well-known in the art and a number of cloning 
10 strategies exist. A number of cDNA library construction kits are commercially available. The 
Superscript™ Plasmid System for cDNA synthesis and Plasmid Cloning (Gibco BRL, Life 
Technologies, Gaithersburg, Maryland U.S.A.) is used, following the conditions suggested by the 
manufacturer. 

The cDNA library of the present invention designated LIB25, is prepared from 
15 Arahidopsis thaliana, Columbia ecotype, open flower tissue. Wild type Arahidopsis thaliana 
seeds are planted in commonly used planting pots and grown in an environmental chamber. 
Flower are completely opened with all parts of floral structure observable, but no siliques are 
appearing, and are harvested about seven weeks after planting. The tissue was immediately- 
frozen in liquid nitrogen and stored at -8CTC until total RNA extraction. 
20 The stored RNA is purified using Trizol reagent from Life Technologies (Gibco BRL, 

Life Technologies, Gaithersburg, Maryland U.S.A.), essentially as recommended by the 
manufacturer. Poly A+ RNA (mRNA) is purified using magnetic oligo dT beads essentially as 
recommended by the manufacturer (Dynabeads, Dynal Corporation, Lake Success, New York 
U.S.A.). 

25 Construction of plant cDNA libraries is well-known in the art and a number of cloning 

strategies exist. A number of cDNA library construction kits are commercially available. The 
Superscript™ Plasmid System for cDNA synthesis and Plasmid Cloning (Gibco BRL, Life 
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Technologies, Gaithersburg, Maryland U.S.A.) is used, following the conditions suggested by the 
manufacturer. 

Example 3 

Detection of Changes in Sterol Metabolism 

5 A labeled acetyl-CoA molecule, squalene molecule, or acetate are used in a variety of 

assays to detect changes in sterol production, secretion, localization, protein-binding, 
degradation, and trafficking known in the art. The example below illustrates. 

Cells from transformed plants are cultured in an appropriate medium. Labeled acetate, 
preferably 14 C-labeled, is added to a concentration of about 1 uCi/ml. After a period of growth, 
10 the cells are collected, the lipids extracted, and resolved by thin-layer chromatography or run 

over HPLC column using known methods. The levels of each sterol resolved can be compared to 
control cells fed the same labeled 14 C acetate and the amount of l4 C-labeled sterol for each 
determined from the resolved sterols. 
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